Presentación interactiva "Exploración y Modelamiento de Datos"¶

  • Dataset seleccionado: Movie metadata
  • URL: https://www.kaggle.com/datasets/bobirino/movie-metadata

Variables Cuantitativas y Cualitativas¶

Variables Cuantitativas¶

Nombre Tipo Variable Descripción
num_critic_for_reviews Numérica continua Número de críticas profesionales registradas
duration Numérica continua Duración de la película en minutos
director_facebook_likes Numérica discreta Número de "me gusta" que tiene el director en Facebook
actor_3_facebook_likes Numérica discreta Número de "me gusta" del tercer actor principal en Facebook
actor_1_facebook_likes Numérica discreta Número de "me gusta" del actor principal en Facebook
gross Numérica continua Ingresos brutos generados por la película (en dólares)
num_voted_users Numérica discreta Número de usuarios que han votado la película en IMDb
cast_total_facebook_likes Numérica discreta Suma total de "me gusta" del elenco principal en Facebook
facenumber_in_poster Numérica discreta Número de rostros visibles en el póster de la película
num_user_for_reviews Numérica discreta Número de reseñas escritas por usuarios
budget Numérica continua Presupuesto estimado de producción (en dólares)
title_year Numérica discreta Año de lanzamiento de la película
actor_2_facebook_likes Numérica discreta Número de "me gusta" del segundo actor principal en Facebook
imdb_score Numérica continua Puntaje promedio otorgado por usuarios en IMDb (escala 1–10)
aspect_ratio Numérica continua Relación de aspecto de la imagen (e.g., 1.85, 2.35)
movie_facebook_likes Numérica discreta Número de "me gusta" que tiene la película en Facebook

Variables Cualitativas¶

Nombre Tipo Variable Descripción
color Categórica nominal Indica si la película es en color o blanco y negro
director_name Categórica nominal Nombre del director de la película
actor_2_name Categórica nominal Nombre del segundo actor principal
genres Categórica nominal Géneros asociados a la película (puede incluir múltiples)
actor_1_name Categórica nominal Nombre del actor principal
movie_title Categórica nominal Título de la película
actor_3_name Categórica nominal Nombre del tercer actor principal
plot_keywords Categórica nominal Palabras clave que describen la trama
movie_imdb_link Categórica nominal URL del enlace a la película en IMDb
language Categórica nominal Idioma principal de la película
country Categórica nominal País de origen de la película
content_rating Categórica ordinal Clasificación de contenido (e.g., PG, R) según edad recomendada

Definición de librerías usadas en el proyecto¶

In [1]:
import random
import math
import re
import os
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt

from scipy.stats import zscore
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler, MultiLabelBinarizer
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from typing import Optional, Sequence, Tuple
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge

Funciones y Clases auxiliares¶

Función de formateo de decimales¶

In [2]:
def format_decimals(number):
    return f"{number:,.2f}".rstrip('0').rstrip('.')

Clase Regresión Lineal Customizada¶

In [87]:
class CustomLinearRegression():
    def __init__(self, X, y, title, is_interactive = True):
        self.X = X
        self.y = y
        self.title = title
        self.model = LinearRegression()
        self.coef_df: pd.DataFrame = pd.DataFrame()
        self.performance_metrics = None
        self.is_interactive = is_interactive
        

    def run(self):
        print("--> Iniciando la division del dataset")
        X_train, X_test, y_train, y_test = self.split_data()
        print("-" * 40)

        print("--> Iniciando el entrenamiento del modelo")
        self.fit(X_train, y_train)
        print("-" * 40)

        print("--> Iniciando la predicción del modelo")
        y_pred_test = self.predict(X_test)
        print("-" * 40)

        print("--> Iniciando la evaluación del modelo")
        self.evaluate_model(y_test, y_pred_test)
        print("-" * 40)
        
        print("--> Iniciando la creación del dataframe de coeficientes")
        self.coefficients_per_variable(X_train)
        print("-" * 40)
        
        print("--> Prediciendo sobre entrenamiento y prueba")
        y_train_pred, y_test_pred = self.test_prediction(X_train, X_test)
        print("-" * 40)

        if self.is_interactive:
            print("--> Graficando comparación interactiva del modelo")
            self.plot_comparison_interactive(y_train, y_test, y_train_pred, y_test_pred)
            print("-" * 40)
        else:
            print("--> Graficando comparación del modelo")
            self.plot_comparison(y_train, y_test, y_train_pred, y_test_pred)
            print("-" * 40)
        
        if self.is_interactive:
            print("--> Graficando residuos interactivos")
            self.plot_residuals_interactive(X_test, y_test)
            print("-" * 40)
        else:
            print("--> Graficando residuos")
            self.plot_residuals(X_test, y_test)
            print("-" * 40)
        
        if self.is_interactive:
            print("--> Graficando importance de variables interactiva")
            self.plot_feature_importance_interactive()
            print("-" * 40)
        else:
            print("--> Graficando importance de variables")
            self.plot_feature_importance()
            print("-" * 40)
    

    def split_data(self):
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=0.2, random_state=42)
        print(f"\tTamaño del dataset: {len(self.X)}")
        print(f"\tTamaño del dataset de entrenamiento: {len(X_train)}")
        print(f"\tTamaño del dataset de prueba: {len(X_test)}")
        return X_train, X_test, y_train, y_test


    def fit(self, X_train, y_train):
        self.model.fit(X_train, y_train)


    def predict(self, X):
        return self.model.predict(X)


    def evaluate_model(self, y_true, y_pred):
        mae = mean_absolute_error(y_true, y_pred)
        mse = mean_squared_error(y_true, y_pred)
        r2 = r2_score(y_true, y_pred)

        print(f"\tError absoluto medio (MAE): {mae:.2f}")
        print(f"\tError cuadrático medio (MSE): {mse:.2f}")
        print(f"\tCoeficiente de determinación (R²): {r2:.2f}")

        self.performance_metrics = pd.DataFrame({
            'Total Features': [self.X.shape[1]],
            'MAE': [mae],
            'MSE': [mse],
            'R2': [r2]
        })


    def coefficients_per_variable(self, X_train):
        if hasattr(X_train, 'columns'):
            variable_names = X_train.columns
        else:
            n_components = X_train.shape[1]
            variable_names = [f"PC{i+1}" for i in range(n_components)]

        self.coef_df = pd.DataFrame({
            'Variable': variable_names,
            'Coeficiente': self.model.coef_
        }).sort_values(by='Coeficiente', ascending=False).head(10)


    def test_prediction(self, X_train, X_test):
        return self.predict(X_train), self.predict(X_test)

    
    def plot_comparison(self, y_train, y_test, y_train_pred, y_test_pred):
        plt.figure(figsize=(10, 6))

        plt.scatter(y_train, y_train_pred, color='blue', alpha=0.5, label='Entrenamiento')
        plt.scatter(y_test, y_test_pred, color='green', alpha=0.5, label='Prueba')
        plt.plot([self.y.min(), self.y.max()], [self.y.min(), self.y.max()], color='red', linestyle='--', label='Ideal')

        plt.xlabel('IMDb Score Real')
        plt.ylabel('IMDb Score Predicho')
        plt.title('Comparación de Predicciones: Entrenamiento vs Prueba')
        plt.legend()
        plt.grid(True)
        plt.tight_layout()
        plt.show()

    
    def plot_comparison_interactive(self, y_train, y_test, y_train_pred, y_test_pred):
        fig = go.Figure()

        fig.add_trace(go.Scatter(
            x=y_train, y=y_train_pred,
            mode='markers',
            name='Entrenamiento',
            marker=dict(color='blue', opacity=0.5),
            hovertemplate='Real: %{x}<br>Predicho: %{y}'
        ))

        fig.add_trace(go.Scatter(
            x=y_test, y=y_test_pred,
            mode='markers',
            name='Prueba',
            marker=dict(color='green', opacity=0.5),
            hovertemplate='Real: %{x}<br>Predicho: %{y}'
        ))

        fig.add_trace(go.Scatter(
            x=[min(y_train.min(), y_test.min()), max(y_train.max(), y_test.max())],
            y=[min(y_train.min(), y_test.min()), max(y_train.max(), y_test.max())],
            mode='lines',
            name='Ideal',
            line=dict(color='red', dash='dash')
        ))

        fig.update_layout(
            title='Comparación de Predicciones: Entrenamiento vs Prueba',
            xaxis_title='IMDb Score Real',
            yaxis_title='IMDb Score Predicho',
            template='plotly_white',
        )

        fig.show()

        fig_title = f"{self.title}_plot_comparison_interactive.html"
        fig.write_html(f"assets/{fig_title}")

        print(f"\tGráfico guardado en assets/{fig_title}")

    
    def plot_residuals(self, X_test, y_test):
        y_pred = self.predict(X_test)
        residuals = y_test - y_pred

        plt.figure(figsize=(10, 5))
        plt.scatter(y_pred, residuals, alpha=0.6, color='purple')
        plt.axhline(y=0, color='red', linestyle='--')
        plt.xlabel('Predicción')
        plt.ylabel('Residuo (Real - Predicción)')
        plt.title('Gráfico de Residuos')
        plt.grid(True)
        plt.tight_layout()
        plt.show()


    def plot_residuals_interactive(self, X_test, y_test):
        y_pred = self.predict(X_test)
        residuals = y_test - y_pred

        fig = go.Figure()

        fig.add_trace(go.Scatter(
            x=y_pred, y=residuals,
            mode='markers',
            marker=dict(color='purple', opacity=0.6),
            hovertemplate='Predicción: %{x}<br>Residuo: %{y}'
        ))

        fig.add_trace(go.Scatter(
            x=[min(y_pred), max(y_pred)],
            y=[0, 0],
            mode='lines',
            line=dict(color='red', dash='dash'),
            name='Residuo = 0'
        ))

        fig.update_layout(
            title='Gráfico de Residuos',
            xaxis_title='Predicción',
            yaxis_title='Residuo (Real - Predicción)',
            template='plotly_white'
        )

        fig.show()

        fig_title = f"{self.title}_plot_residuals_interactive.html"
        fig.write_html(f"assets/{fig_title}")

        print(f"\tGráfico guardado en assets/{fig_title}")


    
    def plot_feature_importance(self, top_n=10):
        if self.coef_df is None:
            print("ERROR: Primero ejecuta `.run()` para calcular los coeficientes.")
            return

        df = self.coef_df.head(top_n).sort_values(by='Coeficiente')
        plt.figure(figsize=(10, 6))
        plt.barh(df['Variable'], df['Coeficiente'], color='teal')
        plt.xlabel('Coeficiente')
        plt.title(f'Top {top_n} Variables más Influyentes')
        plt.grid(True, axis='x')
        plt.tight_layout()
        plt.show()

    
    def plot_feature_importance_interactive(self, top_n=10):
        df = self.coef_df.head(top_n).sort_values(by='Coeficiente')

        fig = go.Figure(go.Bar(
            x=df['Coeficiente'],
            y=df['Variable'],
            orientation='h',
            marker_color='teal',
            hovertemplate='Variable: %{y}<br>Coeficiente: %{x}'
        ))

        fig.update_layout(
            title=f'Top {top_n} Variables más Influyentes',
            xaxis_title='Coeficiente',
            template='plotly_white'
        )
        
        fig.show()
        
        fig_title = f"{self.title}_plot_feature_importance_interactive.html"
        fig.write_html(f"assets/{fig_title}")

        print(f"\tGráfico guardado en assets/{fig_title}")


    
    def summary(self):
        print("\nRESUMEN DEL MODELO")
        print("-" * 40)

        if self.performance_metrics is not None:
            print("--> Métricas de desempeño:")
            display(self.performance_metrics)
        else:
            print("ERROR: No se han calculado métricas. Ejecuta `.run()` primero.")

        if self.coef_df is not None:
            print("--> Principales coeficientes:")
            display(self.coef_df)
        else:
            print("ERROR: No se han generado coeficientes aún.")

Clase EDA Visualizer helpers¶

In [4]:
class EDAVisualizerHelpers:
    @staticmethod
    def _assert_cols(dataframe: pd.DataFrame, columns: Sequence[str]) -> None:
        missing = [column for column in columns if column not in dataframe.columns]
        if missing:
            raise ValueError(f"Columnas no encontradas en el DataFrame: {missing}")


    @staticmethod
    def _slugify(text: str) -> str:
        text = re.sub(r"[^\w\s-]", "", text, flags=re.UNICODE)
        text = re.sub(r"\s+", "_", text.strip())
        return text


    @staticmethod
    def _ensure_dir(path: str) -> None:
        os.makedirs(path, exist_ok=True)


    @staticmethod
    def _save_plotly_html(fig: go.Figure, title: str, folder: str = "assets") -> str:
        EDAVisualizerHelpers._ensure_dir(folder)
        fname = f"{EDAVisualizerHelpers._slugify(title)}.html"
        out = os.path.join(folder, fname)
        fig.write_html(out)
        return out

Clase EDA Visualizer Estático¶

In [5]:
class EDAVisualizerStatic:
    @staticmethod
    def plot_boxplot(
        title: str, 
        data: pd.DataFrame, 
        x: str, y: str, 
        x_label: Optional[str] = None, y_label: Optional[str] = None,
        figsize : Tuple[int, int] = (20, 10),
        rotation: int = 90,
        y_range: Optional[Tuple[float, float]] = None,
        grid: bool = True, show: bool = True
    ):
        if x_label is None: x_label = x
        if y_label is None: y_label = y

        plt.figure(figsize=figsize)
        sns.boxplot(data=data, x=x, y=y)
        plt.xticks(rotation=rotation)
        plt.title(title)
        plt.xlabel(x_label)
        plt.ylabel(y_label)

        if y_range is not None: plt.ylim(*y_range)
        if grid: plt.grid(axis='y', linestyle='--', alpha=0.9)
        if show: plt.show()


    @staticmethod
    def plot_histogram(
        title: str,
        data: pd.DataFrame,
        column: str,
        x_label: Optional[str] = None, y_label: Optional[str] = "Frecuencia",
        bins: int = 30, kde: bool = True,
        figsize: Tuple[int, int] = (8, 4),
        x_range: Optional[Tuple[float, float]] = None,
        grid: bool = True, show: bool = True,
    ):
        EDAVisualizerHelpers._assert_cols(data, [column])
        if not np.issubdtype(data[column].dropna().dtype, np.number): # type: ignore
            raise TypeError(f"La columna '{column}' debe ser numérica para histograma.")

        x_label = column if x_label is None else x_label

        plt.figure(figsize=figsize)
        sns.histplot(data=data, x=column, bins=bins, kde=kde)
        plt.title(title)
        plt.xlabel(x_label)
        plt.ylabel(y_label if y_label else "Frecuencia")

        if x_range is not None: plt.xlim(*x_range)
        if grid: plt.grid(linestyle="--", alpha=0.9)
        if show: plt.show()


    @staticmethod
    def plot_categorical_counts_grid(
        data: pd.DataFrame,
        categorical_cols: Sequence[str],
        n_cols: int = 2,
        top_n: int = 8,
        figsize: Optional[Tuple[int, int]] = None,
        rotate_xticks: int = 45,
        show: bool = True,
    ):
        """
        Malla de countplots con top-N cuando hay demasiadas categorías.
        """
        EDAVisualizerHelpers._assert_cols(data, list(categorical_cols))
        n = n_cols
        n_rows = math.ceil(len(categorical_cols) / n)
        if figsize is None:
            figsize = (15, 7 * n_rows)

        plt.figure(figsize=figsize)
        for i, col in enumerate(categorical_cols):
            plt.subplot(n_rows, n, i + 1)
            series = data[col].astype("string")
            n_unique = series.nunique(dropna=True)

            if n_unique <= top_n:
                sns.countplot(data=data, x=col)
                plt.title(col)
            else:
                top_categories = series.value_counts().nlargest(top_n).index
                sns.countplot(data=data[data[col].isin(top_categories)], x=col)
                plt.title(f"{col} (Top {top_n} categorías)")

            plt.xticks(rotation=rotate_xticks)
            plt.xlabel(col)
            plt.ylabel("Frecuencia")

        plt.tight_layout()
        plt.subplots_adjust(hspace=0.5, wspace=0.3)
        
        if show: plt.show()

    
    @staticmethod
    def plot_pairplot(dataset: pd.DataFrame, title: str):
        g = sns.pairplot(dataset)
        plt.title(title)
        g.map_upper(sns.kdeplot, levels=4, color=".2")
        plt.show()


    @staticmethod
    def plot_heatmap(correlation_matrix: pd.DataFrame, figsize: Optional[Tuple[int, int]] = None,):
        fig, ax = plt.subplots(figsize=figsize) 
        sns.heatmap(correlation_matrix, annot=True, fmt=".2f", annot_kws={'size': 16})

    
    @staticmethod
    def plot_3d_projection(
        dataset: pd.DataFrame, 
        column: str, 
        title: str,
        x_label: str,
        y_label: str,
        z_label: str,
        cbar_label: str,
        figsize: Optional[Tuple[int, int]] = None,
    ):
        fig = plt.figure(figsize=figsize)
        ax = fig.add_subplot(111, projection='3d')

        scatter = ax.scatter(x_pca[:, 0], x_pca[:, 1], x_pca[:, 2], c=dataset[column], cmap='viridis', s=40) # type: ignore

        ax.set_xlabel(x_label)
        ax.set_ylabel(y_label)
        ax.set_zlabel(z_label)
        ax.set_title(title)

        cbar = plt.colorbar(scatter, ax=ax, shrink=0.5, aspect=10)
        cbar.set_label(cbar_label)

        plt.tight_layout()
        plt.show()

Clase EDA Visualizer Interactivo¶

In [6]:
class EDAVisualizerInteractive:
    @staticmethod
    def plot_boxplot(
        title: str,
        data: pd.DataFrame,
        x: str, y: str,
        x_label: Optional[str] = None,
        y_label: Optional[str] = None,
        y_range: Optional[Tuple[float, float]] = None,
        template: str = "plotly_white",
        height: int = 750, width: int = 1350,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
        points: str = "all",  # 'all' | 'outliers' | False
        hover_extra_cols: Optional[Sequence[str]] = None,
    ):
        EDAVisualizerHelpers._assert_cols(data, [x, y])
        if hover_extra_cols:
            EDAVisualizerHelpers._assert_cols(data, list(hover_extra_cols))

        x_label = x if x_label is None else x_label
        y_label = y if y_label is None else y_label

        fig = px.box(
            data_frame=data,
            x=x, y=y,
            points=points,
            hover_data=list(hover_extra_cols) if hover_extra_cols else None,
            title=title,
        )
        fig.update_layout(
            xaxis_title=x_label, yaxis_title=y_label,
            xaxis_tickangle=-90,
            template=template,
            height=height, width=width,
        )
        if y_range is not None: fig.update_yaxes(range=list(y_range))

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Boxplot guardado en {out_path}")
        return out_path


    @staticmethod
    def plot_histogram_by_category(
        title: str,
        subtitle: str,
        data: pd.DataFrame,
        category_col: str,
        value_col: str,
        categories: Optional[Sequence[str]] = None,
        nbins: int = 30,
        template: str = "plotly_white",
        x_label: Optional[str] = None,
        y_label: Optional[str] = "Frecuencia",
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
        height: int = 750,
        width: int = 1350,
    ):
        EDAVisualizerHelpers._assert_cols(data, [category_col, value_col])

        # Validar numérico
        if not np.issubdtype(data[value_col].dropna().dtype, np.number): # type: ignore
            raise TypeError(f"'{value_col}' debe ser numérica para histograma.")

        cat_values = categories or list(map(str, sorted(data[category_col].dropna().unique())))
        cat_values = list(cat_values)  # asegurar indexable

        fig = go.Figure()
        for i, cat in enumerate(cat_values):
            subset = data[data[category_col] == cat]
            fig.add_trace(
                go.Histogram(
                    x=subset[value_col],
                    name=str(cat),
                    visible=(i == 0),
                    nbinsx=nbins,
                    opacity=0.75,
                )
            )

        buttons = []
        for i, cat in enumerate(cat_values):
            visible_mask = [j == i for j in range(len(cat_values))]
            buttons.append(
                dict(
                    label=str(cat),
                    method="update",
                    args=[
                        {"visible": visible_mask},
                        {
                            "title": f"{subtitle} {cat}",
                            "xaxis": {"title": x_label or value_col},
                            "yaxis": {"title": y_label or "Frecuencia"},
                        },
                    ],
                )
            )

        fig.update_layout(
            updatemenus=[dict(active=0, buttons=buttons, x=1.15, y=1.15)],
            title=title if cat_values == [] else f"{subtitle} {cat_values[0]}",
            xaxis_title=x_label or value_col,
            yaxis_title=y_label or "Frecuencia",
            template=template,
            height=height,
            width=width,
            bargap=0.1,
        )

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Histograma guardado en {out_path}")
        return out_path
    

    
    @staticmethod
    def plot_categorical_counts_dropdown(
        title: str,
        data: pd.DataFrame,
        categorical_cols: Sequence[str],
        top_n: int = 8,
        template: str = "plotly_white",
        height: int = 750,
        width: int = 1350,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
    ) -> Optional[str]:
        """
        Un solo gráfico de barras Plotly con dropdown para alternar columnas categóricas.
        Limita a Top-N categorías por columna cuando sea necesario.
        """
        EDAVisualizerHelpers._assert_cols(data, list(categorical_cols))

        fig = go.Figure()
        # Preparar trazas (una por columna categórica)
        for i, col in enumerate(categorical_cols):
            series = data[col].astype("string")
            if series.nunique(dropna=True) <= top_n:
                counts = series.value_counts(dropna=False).sort_values(ascending=False)
            else:
                top_categories = series.value_counts().nlargest(top_n).index
                counts = series[series.isin(top_categories)].value_counts().sort_values(ascending=False)

            fig.add_trace(
                go.Bar(
                    x=counts.index.astype(str),
                    y=counts.values,
                    name=str(col),
                    visible=(i == 0),
                )
            )

        # Dropdown de visibilidad
        buttons = []
        for i, col in enumerate(categorical_cols):
            visible_mask = [j == i for j in range(len(categorical_cols))]
            buttons.append(
                dict(
                    label=str(col),
                    method="update",
                    args=[
                        {"visible": visible_mask},
                        {
                            "title": f"Frecuencia de categorías en {col}",
                            "xaxis": {"title": str(col)},
                            "yaxis": {"title": "Frecuencia"},
                        },
                    ],
                )
            )

        first = str(categorical_cols[0])
        fig.update_layout(
            updatemenus=[dict(active=0, buttons=buttons, x=1.15, y=1.15)],
            title=f"Frecuencia de categorías en {first}",
            xaxis_title=first,
            yaxis_title="Frecuencia",
            template=template,
            height=height,
            width=width,
        )

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Gráfico guardado en {out_path}")
        return out_path
    

    @staticmethod
    def plot_numerical_hists_dropdown(
        title: str,
        data: pd.DataFrame,
        numerical_cols: Sequence[str],
        nbins: int = 30,
        template: str = "plotly_white",
        height: int = 750,
        width: int = 1350,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
    ) -> Optional[str]:
        """
        Un solo histograma Plotly con dropdown para alternar columnas numéricas.
        """
        EDAVisualizerHelpers._assert_cols(data, list(numerical_cols))

        fig = go.Figure()
        for i, col in enumerate(numerical_cols):
            if not np.issubdtype(data[col].dropna().dtype, np.number): # type: ignore
                raise TypeError(f"'{col}' debe ser numérica para histograma.")
            fig.add_trace(
                go.Histogram(
                    x=data[col],
                    name=str(col),
                    nbinsx=nbins,
                    opacity=0.75,
                    visible=(i == 0),
                )
            )

        buttons = []
        for i, col in enumerate(numerical_cols):
            visible_mask = [j == i for j in range(len(numerical_cols))]
            buttons.append(
                dict(
                    label=str(col),
                    method="update",
                    args=[
                        {"visible": visible_mask},
                        {
                            "title": f"Distribución de {col}",
                            "xaxis": {"title": str(col)},
                            "yaxis": {"title": "Frecuencia"},
                        },
                    ],
                )
            )

        first = str(numerical_cols[0])
        fig.update_layout(
            updatemenus=[dict(active=0, buttons=buttons, x=1.15, y=1.15)],
            title=f"Distribución de {first}",
            xaxis_title=first,
            yaxis_title="Frecuencia",
            template=template,
            height=height,
            width=width,
            bargap=0.1,
        )

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Gráfico guardado en {out_path}")
        return out_path
    

    @staticmethod
    def plot_pairplot(
        dataset: pd.DataFrame, 
        title: str,
        height: int = 1000,
        width: int = 1000,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
    ):
        fig = px.scatter_matrix(
            dataset,
            dimensions=dataset.select_dtypes(include='number').columns,
            color='imdb_score',  # opcional si quieres colorear por variable objetivo
            title=title,
            height=height,
            width=width
        )

        fig.update_traces(diagonal_visible=False)  # oculta histogramas diagonales si prefieres
        fig.update_layout(template='plotly_white')
        
        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Gráfico guardado en {out_path}")
        return out_path
    

    @staticmethod
    def plot_heatmap(
        correlation_matrix: pd.DataFrame,
        title: str,
        height: int = 1350,
        width: int = 1350,
        font_size: int = 12,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
    ):
        corr_long = correlation_matrix.reset_index().melt(id_vars='index')
        corr_long.columns = ['Variable 1', 'Variable 2', 'Correlación']

        fig = px.imshow(
            correlation_matrix,
            text_auto=True,
            color_continuous_scale='RdBu_r',
            zmin=-1, zmax=1,
            title=title
        )

        fig.update_layout(
            title_font_size=25,
            xaxis_title='Variables',
            yaxis_title='Variables',
            xaxis=dict(tickfont=dict(size=font_size)),
            yaxis=dict(tickfont=dict(size=font_size)),
            template='plotly_white',
            height=height,
            width=width,
        )

        fig.update_traces(textfont_size=font_size)

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Gráfico guardado en {out_path}")
        return out_path
    

    @staticmethod
    def plot_3d_projection(
        dataset: pd.DataFrame,
        title: str,
        x_label: str,
        y_label: str,
        z_label: str,
        label: str,
        show: bool = True,
        save_html: bool = True,
        save_folder: str = "assets",
    ):
        fig = px.scatter_3d(
            dataset, 
            x=x_label, y=y_label, z=z_label,
            color=label,
            color_continuous_scale='Viridis',
            title=title,
            labels={label: label}
        )

        fig.update_traces(marker=dict(size=3))

        if show: fig.show()

        out_path = None
        if save_html:
            out_path = EDAVisualizerHelpers._save_plotly_html(fig, title, folder=save_folder)
            print(f"Gráfico guardado en {out_path}")
        return out_path

Carga del dataset¶

In [7]:
file_name = "movie_metadata.csv"
dataset = pd.read_csv(file_name)

Descripción del Dataset¶

Cantidad de registros y número de columnas

In [8]:
dataset.shape
Out[8]:
(5043, 28)

Inferencia del dataset

In [9]:
print("\033[1mInference:\033[0m El dataset consiste de {} features y {} ejemplos".format(dataset.shape[1], dataset.shape[0]))
Inference: El dataset consiste de 28 features y 5043 ejemplos

Columnas del dataset

In [10]:
dataset.keys()
Out[10]:
Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

Ejemplos de los primeros 5 y últimos 5 registros del dataset

In [11]:
dataset.head()
Out[11]:
color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 0.0 855.0 Joel David Moore 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 0.0 161.0 Rory Kinnear 11000.0 200074175.0 Action|Adventure|Thriller ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 22000.0 23000.0 Christian Bale 27000.0 448130642.0 Action|Thriller ... 2701.0 English USA PG-13 250000000.0 2012.0 23000.0 8.5 2.35 164000
4 NaN Doug Walker NaN NaN 131.0 NaN Rob Walker 131.0 NaN Documentary ... NaN NaN NaN NaN NaN NaN 12.0 7.1 NaN 0

5 rows × 28 columns

In [12]:
dataset.tail()
Out[12]:
color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
5038 Color Scott Smith 1.0 87.0 2.0 318.0 Daphne Zuniga 637.0 NaN Comedy|Drama ... 6.0 English Canada NaN NaN 2013.0 470.0 7.7 NaN 84
5039 Color NaN 43.0 43.0 NaN 319.0 Valorie Curry 841.0 NaN Crime|Drama|Mystery|Thriller ... 359.0 English USA TV-14 NaN NaN 593.0 7.5 16.00 32000
5040 Color Benjamin Roberds 13.0 76.0 0.0 0.0 Maxwell Moody 0.0 NaN Drama|Horror|Thriller ... 3.0 English USA NaN 1400.0 2013.0 0.0 6.3 NaN 16
5041 Color Daniel Hsia 14.0 100.0 0.0 489.0 Daniel Henney 946.0 10443.0 Comedy|Drama|Romance ... 9.0 English USA PG-13 NaN 2012.0 719.0 6.3 2.35 660
5042 Color Jon Gunn 43.0 90.0 16.0 16.0 Brian Herzlinger 86.0 85222.0 Documentary ... 84.0 English USA PG 1100.0 2004.0 23.0 6.6 1.85 456

5 rows × 28 columns

Validar el número de filas únicas en cada feature

In [13]:
dataset.nunique().sort_values()
Out[13]:
color                           2
content_rating                 18
facenumber_in_poster           19
aspect_ratio                   22
language                       46
country                        65
imdb_score                     78
title_year                     91
duration                      191
director_facebook_likes       435
budget                        439
num_critic_for_reviews        528
movie_facebook_likes          876
actor_1_facebook_likes        878
actor_3_facebook_likes        906
genres                        914
actor_2_facebook_likes        917
num_user_for_reviews          954
actor_1_name                 2097
director_name                2398
actor_2_name                 3032
actor_3_name                 3521
cast_total_facebook_likes    3978
gross                        4035
plot_keywords                4760
num_voted_users              4826
movie_title                  4917
movie_imdb_link              4919
dtype: int64

Listado de valores únicos por columna

In [14]:
for key in dataset.keys():
    unique = dataset[key].unique()
    print(f"\033[1m{key}:\033[0m")
    print(f"\t- Total de datos únicos: {len(unique)}")
    print(f"\t- Valores: {unique}\n")
color:
	- Total de datos únicos: 3
	- Valores: ['Color' nan ' Black and White']

director_name:
	- Total de datos únicos: 2399
	- Valores: ['James Cameron' 'Gore Verbinski' 'Sam Mendes' ... 'Scott Smith'
 'Benjamin Roberds' 'Daniel Hsia']

num_critic_for_reviews:
	- Total de datos únicos: 529
	- Valores: [723. 302. 602. 813.  nan 462. 392. 324. 635. 375. 673. 434. 403. 313.
 450. 733. 258. 703. 448. 451. 422. 599. 343. 509. 251. 446. 315. 516.
 377. 644. 750. 300. 608. 334. 376. 366. 378. 525. 495. 469. 304. 436.
 453. 424. 654. 539. 590. 338. 490. 306. 575. 428. 470. 298. 488. 322.
 421. 162. 367. 240. 384. 248. 284. 396. 645. 408. 219. 486. 682.  85.
 264. 418. 186. 585.  91. 250. 536. 370. 416. 401. 521.  10. 218. 576.
 226. 443. 188. 286. 288. 280. 653. 712. 642.   1. 187. 362. 500. 389.
 235. 231. 227. 275. 474. 228. 191. 329. 295. 318. 323. 276. 478. 167.
 185. 350. 245. 406. 739. 225. 145. 310. 526. 465. 357. 194. 339. 132.
 135. 256. 196. 220. 211. 464. 208. 287. 210. 432. 190. 314. 518. 291.
 292. 184. 141. 267. 351. 163. 166. 510. 197. 244. 156. 354.  21. 252.
 556. 153. 266. 517. 502. 165.  94. 246. 330. 440. 274. 349. 154. 233.
 271.   4. 294. 159. 289. 342. 382. 344. 183. 175. 239. 237. 262. 552.
 102. 775.  71. 476. 207. 492. 168. 283. 359. 320. 257.  33. 152. 348.
 738.  93. 181. 369. 179. 358. 160. 192. 198. 263. 447.  29. 172. 104.
 327. 125.  79. 326. 297. 174. 109. 101. 568.  62. 265. 232. 400. 230.
 180.  81. 765.  80. 383. 193. 170. 333. 203. 321. 606. 144. 511. 212.
 127.  78.  66.  97. 202. 136. 169. 200. 255. 173. 221.  82. 308. 301.
 328. 199. 355. 529. 412. 106.  61. 217. 316. 352. 143. 148. 415. 146.
  70. 269. 253. 281. 122. 157.  64. 142.  84. 201.  47. 114. 206. 222.
 103. 236. 238. 107. 459. 151. 229. 158.  98. 393. 149. 138. 345. 120.
 234. 134. 139. 155. 204.  95. 215. 325.  53.  46. 147. 178. 209.  19.
  31. 129. 124.  35. 137. 121.  87.  63. 113. 205. 123. 272.   3. 140.
 150. 119.  49. 177. 372. 290. 164.  12. 241. 161.  89. 131.  67. 130.
  74. 435. 117. 108. 176. 299. 128.  88. 261.  73.  75.  39. 247.  59.
 388. 371.  76. 105. 242. 360. 112. 189.  92.  51. 293.  40.  90. 538.
 307.  72.  86. 279.  96.  14.  60. 361.  42.  68. 596. 460. 249. 213.
 118.  77. 171. 387. 110.  83.  34.   8. 223. 111. 100. 115.  54. 579.
  56.  20.  57. 133.  26. 491.  55.  45. 224.  50.  44. 277. 391. 216.
 558. 413. 457.  65. 116. 419.   2.  36.  99. 259. 356.  22. 214.  28.
 296.  30.  48. 195.  32.  43. 454. 398.  38.  25.  23.  24. 656. 270.
  27.   9. 433. 319.  41. 374. 341.  16. 420. 303. 260. 335. 273.  37.
 546. 437. 126.   5. 340. 493. 332. 405. 285.  13. 584.  58.  52. 254.
 522. 441.  15.  18. 449. 472. 268. 452. 589. 487. 305. 397.  69. 634.
 417. 368.   7.  17. 426. 309. 373. 317. 336. 365. 445. 574. 394.   6.
 423. 466.  11. 549. 597. 364. 282. 427. 390. 182. 588. 543. 479. 676.
 278. 414. 331. 669. 489. 399. 385. 363. 410. 535. 386. 439. 346. 534.
 411. 471. 444. 548. 425. 337. 533. 311. 663. 481. 409.]

duration:
	- Total de datos únicos: 192
	- Valores: [178. 169. 148. 164.  nan 132. 156. 100. 141. 153. 183. 106. 151. 150.
 143. 173. 136. 186. 113. 201. 194. 147. 131. 124. 135. 195. 108. 104.
 165. 130. 142. 125. 123. 103. 118. 140. 149. 114. 116. 154. 122.  93.
  98.  91. 158.  96. 127. 110. 144. 152.  94. 126. 112. 176.  95.  97.
 109. 128. 102. 101. 120. 121. 182. 166. 137. 184. 206. 138. 157. 115.
 111.  89. 105. 119. 129. 146.  88.  99.  90.  85.  92. 196. 133. 215.
  60. 117. 107.  82. 159. 174. 134.  77. 170.  76. 171.  84.  22. 145.
  78. 240. 172.  87. 216. 192.  44.  83. 139.  86. 162.  54.  80.  25.
  74.  81. 177.  73.  43.  45. 163.  30. 212. 187. 189. 188. 280. 155.
  64. 190.  75. 220. 160.  52. 325. 251. 202. 330. 289. 161.  28.  79.
  63. 511.  42. 167. 193. 175. 185. 219.   7. 271.  50.  72.  24.  68.
 225. 236. 180. 334. 270. 227. 286.  65.  55.  41.  69. 293. 200.  40.
 168. 197. 181. 300.  23.  53.  46.  67. 199. 226.  37.  11.  66.  34.
  20.  27.  70.  14.  71.  58.  35.  59.  62.  47.]

director_facebook_likes:
	- Total de datos únicos: 436
	- Valores: [0.00e+00 5.63e+02 2.20e+04 1.31e+02 4.75e+02 1.50e+01 2.82e+02 3.95e+02
 8.00e+01 2.52e+02 1.88e+02 4.64e+02 1.29e+02 9.40e+01 5.32e+02 3.65e+02
 1.00e+03 1.30e+04 4.20e+02 3.70e+01 3.64e+02 4.87e+02 2.58e+02 1.25e+02
 3.68e+02 1.40e+04 1.79e+02 1.13e+02 5.60e+01 6.81e+02 7.76e+02 1.10e+01
 4.00e+03 1.70e+04 3.57e+02 4.52e+02 2.93e+02 2.18e+02 5.80e+01 2.08e+02
 2.74e+02 1.71e+02 1.98e+02 5.96e+02 4.70e+01 3.10e+01 6.63e+02 3.80e+01
 6.60e+01 2.55e+02 8.40e+01 5.71e+02 2.80e+01 2.10e+04 9.05e+02 5.08e+02
 2.26e+02 2.49e+02 3.30e+01 5.00e+01 2.30e+02 1.50e+02 3.50e+01 1.89e+02
 1.51e+02 6.90e+01 7.50e+02 2.00e+03 5.90e+01 1.20e+01 4.73e+02 3.94e+02
 9.00e+01 2.50e+01 4.20e+01 4.56e+02 9.30e+01 1.76e+02 5.00e+00 5.20e+01
 2.30e+01 3.80e+02 2.95e+02 5.03e+02 2.09e+02 6.00e+00 6.08e+02 3.86e+02
      nan 1.30e+01 5.21e+02 5.40e+01 2.35e+02 9.60e+01 1.24e+02 1.07e+02
 7.19e+02 3.23e+02 5.41e+02 6.10e+02 1.67e+02 1.60e+02 6.62e+02 1.23e+02
 2.94e+02 4.46e+02 1.60e+01 1.90e+01 7.90e+01 1.28e+02 6.20e+01 5.50e+01
 2.63e+02 6.70e+01 1.01e+02 1.53e+02 3.40e+01 6.30e+01 5.70e+01 1.20e+04
 2.85e+02 1.60e+04 2.10e+01 1.00e+01 1.65e+02 1.40e+01 7.70e+01 2.07e+02
 6.70e+02 2.60e+01 3.85e+02 2.00e+01 3.42e+02 6.11e+02 9.00e+00 1.16e+02
 1.27e+02 4.40e+01 8.10e+01 7.00e+01 2.12e+02 1.02e+02 9.70e+01 7.00e+00
 3.35e+02 2.21e+02 8.70e+01 4.68e+02 3.78e+02 5.45e+02 2.66e+02 3.60e+01
 2.78e+02 1.68e+02 9.90e+01 7.63e+02 8.80e+01 4.80e+02 7.50e+01 9.10e+01
 1.63e+02 1.54e+02 3.33e+02 1.17e+02 3.00e+01 3.01e+02 4.25e+02 4.00e+01
 4.38e+02 6.50e+01 9.20e+01 4.30e+01 6.40e+01 2.87e+02 1.80e+01 3.09e+02
 4.50e+01 2.75e+02 8.45e+02 1.26e+02 2.70e+01 2.72e+02 1.09e+02 7.20e+01
 1.70e+01 3.83e+02 4.10e+01 2.53e+02 2.20e+01 4.88e+02 1.30e+02 9.06e+02
 2.40e+01 8.00e+00 1.05e+02 2.90e+01 3.00e+03 4.48e+02 4.00e+00 7.60e+01
 7.59e+02 1.10e+04 1.19e+02 6.87e+02 1.38e+02 4.36e+02 1.75e+02 3.22e+02
 7.08e+02 1.97e+02 4.80e+01 3.20e+01 2.34e+02 7.37e+02 1.81e+02 1.62e+02
 2.00e+00 5.30e+01 3.90e+01 1.92e+02 8.92e+02 3.00e+00 1.43e+02 5.10e+01
 8.30e+01 1.08e+02 6.10e+01 3.17e+02 8.20e+01 6.07e+02 1.59e+02 1.61e+02
 4.22e+02 7.10e+01 8.69e+02 1.80e+02 2.77e+02 2.41e+02 1.55e+02 1.48e+02
 1.52e+02 1.74e+02 2.13e+02 6.44e+02 7.30e+01 1.34e+02 2.60e+02 9.80e+01
 6.28e+02 1.18e+02 3.75e+02 6.31e+02 2.70e+02 3.50e+02 7.77e+02 4.90e+01
 8.50e+01 3.26e+02 1.70e+02 5.17e+02 8.35e+02 3.11e+02 8.90e+01 4.60e+01
 3.08e+02 1.33e+02 7.80e+01 1.64e+02 4.15e+02 1.90e+02 6.43e+02 5.84e+02
 5.34e+02 3.04e+02 8.83e+02 3.38e+02 2.51e+02 5.29e+02 4.53e+02 1.94e+02
 6.00e+03 1.32e+02 1.36e+02 1.00e+02 2.38e+02 6.55e+02 7.29e+02 2.46e+02
 3.43e+02 1.50e+04 5.49e+02 1.10e+02 1.14e+02 1.49e+02 4.19e+02 2.65e+02
 1.15e+02 7.10e+02 7.67e+02 1.20e+02 1.44e+02 6.50e+02 3.53e+02 1.87e+02
 2.48e+02 9.73e+02 6.80e+01 2.19e+02 1.21e+02 2.14e+02 4.60e+02 9.11e+02
 5.35e+02 5.97e+02 2.69e+02 1.80e+04 9.56e+02 3.00e+02 6.88e+02 1.95e+02
 4.06e+02 1.40e+02 1.22e+02 3.10e+02 2.10e+02 9.09e+02 3.29e+02 5.00e+02
 3.79e+02 3.37e+02 1.57e+02 5.12e+02 8.47e+02 2.01e+02 1.37e+02 4.05e+02
 3.69e+02 1.72e+02 9.29e+02 3.74e+02 7.45e+02 5.61e+02 4.54e+02 7.56e+02
 1.41e+02 4.45e+02 2.32e+02 7.52e+02 4.12e+02 7.99e+02 1.39e+02 2.61e+02
 2.20e+02 6.00e+01 1.66e+02 2.16e+02 3.19e+02 6.67e+02 4.82e+02 4.34e+02
 3.87e+02 7.87e+02 3.24e+02 5.93e+02 3.02e+02 8.00e+02 1.47e+02 7.98e+02
 9.30e+02 2.36e+02 4.07e+02 5.48e+02 3.41e+02 2.00e+02 3.46e+02 4.74e+02
 5.54e+02 2.28e+02 4.72e+02 1.12e+02 7.35e+02 7.81e+02 2.30e+04 5.92e+02
 2.22e+02 4.40e+02 3.99e+02 3.45e+02 5.20e+02 2.39e+02 2.04e+02 7.66e+02
 3.77e+02 3.58e+02 4.50e+02 6.75e+02 3.30e+02 2.27e+02 1.77e+02 1.03e+02
 1.04e+02 6.73e+02 9.64e+02 3.93e+02 7.00e+02 7.40e+01 3.55e+02 1.84e+02
 4.21e+02 5.89e+02 6.03e+02 2.44e+02 5.31e+02 2.00e+04 1.91e+02 5.22e+02
 7.64e+02 1.35e+02 1.99e+02 2.43e+02 9.23e+02 1.42e+02 2.24e+02 2.47e+02
 9.50e+01 4.90e+02 2.17e+02 4.31e+02 3.73e+02 6.86e+02 6.64e+02 4.67e+02
 9.69e+02 1.58e+02 3.97e+02 2.91e+02]

actor_3_facebook_likes:
	- Total de datos únicos: 907
	- Valores: [8.55e+02 1.00e+03 1.61e+02 2.30e+04      nan 5.30e+02 4.00e+03 2.84e+02
 1.90e+04 1.00e+04 2.00e+03 9.03e+02 3.93e+02 7.48e+02 2.01e+02 7.18e+02
 7.73e+02 9.63e+02 7.38e+02 8.40e+01 7.94e+02 1.10e+04 6.27e+02 3.00e+03
 5.60e+02 7.60e+02 4.64e+02 8.08e+02 8.25e+02 7.76e+02 3.26e+02 7.21e+02
 9.88e+02 1.40e+04 2.00e+04 9.28e+02 1.40e+02 7.70e+01 2.36e+02 9.19e+02
 5.81e+02 1.13e+02 8.38e+02 1.05e+02 5.22e+02 1.73e+02 3.10e+02 1.30e+04
 1.03e+02 8.20e+01 2.62e+02 4.59e+02 5.82e+02 5.95e+02 3.29e+02 7.00e+03
 5.09e+02 6.00e+01 5.70e+02 3.84e+02 5.91e+02 8.46e+02 8.84e+02 2.83e+02
 9.82e+02 2.13e+02 6.04e+02 5.62e+02 8.33e+02 2.67e+02 5.35e+02 7.59e+02
 1.91e+02 6.00e+03 1.20e+01 3.70e+02 7.02e+02 6.92e+02 6.48e+02 5.90e+01
 6.91e+02 6.87e+02 9.79e+02 5.58e+02 5.88e+02 9.54e+02 4.36e+02 2.33e+02
 4.90e+02 3.00e+01 1.20e+04 9.43e+02 2.94e+02 6.99e+02 1.82e+02 5.02e+02
 1.60e+04 6.41e+02 1.62e+02 8.26e+02 1.50e+01 3.46e+02 2.56e+02 4.33e+02
 5.86e+02 3.94e+02 5.17e+02 8.44e+02 7.46e+02 3.22e+02 5.37e+02 9.64e+02
 9.67e+02 6.90e+02 4.45e+02 9.81e+02 9.53e+02 6.53e+02 4.13e+02 9.10e+01
 2.58e+02 9.34e+02 8.48e+02 4.22e+02 2.44e+02 1.68e+02 8.82e+02 1.84e+02
 3.58e+02 7.33e+02 4.63e+02 9.39e+02 1.79e+02 8.83e+02 6.80e+02 5.23e+02
 1.83e+02 8.07e+02 8.77e+02 3.97e+02 2.82e+02 8.00e+03 8.45e+02 1.65e+02
 8.94e+02 3.88e+02 1.59e+02 6.45e+02 3.62e+02 5.60e+01 1.54e+02 8.50e+02
 2.17e+02 2.41e+02 6.02e+02 4.09e+02 6.36e+02 8.12e+02 4.61e+02 3.41e+02
 4.02e+02 2.65e+02 5.67e+02 8.70e+01 1.57e+02 9.29e+02 1.41e+02 6.17e+02
 9.00e+03 4.29e+02 1.30e+01 2.68e+02 7.99e+02 7.80e+01 3.87e+02 3.50e+02
 8.27e+02 1.16e+02 7.41e+02 4.32e+02 1.11e+02 4.21e+02 5.44e+02 4.00e+01
 2.02e+02 4.30e+02 2.97e+02 8.57e+02 4.47e+02 7.80e+02 4.34e+02 4.66e+02
 1.95e+02 5.25e+02 3.72e+02 6.95e+02 5.33e+02 8.34e+02 5.39e+02 5.61e+02
 6.18e+02 3.85e+02 7.08e+02 1.07e+02 3.83e+02 5.42e+02 2.53e+02 2.03e+02
 2.79e+02 1.20e+02 4.52e+02 4.23e+02 5.77e+02 5.66e+02 4.67e+02 7.20e+02
 5.26e+02 2.27e+02 5.71e+02 6.83e+02 9.50e+01 3.60e+01 2.80e+01 9.70e+02
 4.65e+02 5.57e+02 7.22e+02 4.42e+02 9.16e+02 4.16e+02 7.66e+02 2.40e+02
 9.68e+02 5.27e+02 7.79e+02 3.30e+02 4.76e+02 5.50e+01 8.10e+02 5.68e+02
 1.17e+02 4.84e+02 9.18e+02 5.85e+02 4.89e+02 7.27e+02 6.35e+02 5.03e+02
 6.81e+02 4.41e+02 1.35e+02 2.49e+02 8.86e+02 3.07e+02 2.81e+02 9.57e+02
 6.12e+02 1.23e+02 5.54e+02 5.99e+02 2.72e+02 9.71e+02 1.29e+02 9.36e+02
 4.71e+02 5.00e+03 1.18e+02 1.48e+02 8.20e+02 4.39e+02 6.19e+02 6.24e+02
 2.90e+01 2.10e+02 5.51e+02 1.72e+02 8.09e+02 7.19e+02 1.06e+02 2.71e+02
 2.00e+01 6.15e+02 5.21e+02 1.60e+02 4.95e+02 9.56e+02 8.98e+02 5.75e+02
 3.90e+02 9.25e+02 4.43e+02 8.52e+02 1.10e+02 3.43e+02 1.63e+02 2.08e+02
 8.05e+02 8.18e+02 0.00e+00 7.44e+02 9.15e+02 9.13e+02 3.66e+02 3.80e+01
 7.10e+01 6.70e+01 6.97e+02 3.75e+02 2.69e+02 4.27e+02 8.59e+02 4.74e+02
 6.20e+01 1.02e+02 6.58e+02 8.71e+02 8.00e+00 9.04e+02 4.10e+01 2.31e+02
 1.40e+01 7.00e+00 2.12e+02 7.40e+01 8.47e+02 1.80e+01 8.76e+02 5.53e+02
 7.71e+02 3.27e+02 9.24e+02 7.45e+02 9.33e+02 5.05e+02 3.28e+02 2.60e+02
 2.32e+02 2.18e+02 4.46e+02 1.50e+04 4.12e+02 7.87e+02 1.04e+02 1.38e+02
 1.93e+02 9.90e+01 2.22e+02 5.41e+02 6.37e+02 2.15e+02 4.15e+02 7.15e+02
 1.30e+02 9.89e+02 5.74e+02 4.03e+02 4.62e+02 2.42e+02 4.80e+01 7.69e+02
 9.95e+02 7.26e+02 5.76e+02 5.59e+02 8.41e+02 9.92e+02 6.38e+02 1.70e+01
 7.74e+02 4.00e+02 1.50e+02 3.24e+02 6.60e+01 4.07e+02 2.98e+02 8.54e+02
 3.01e+02 2.46e+02 4.97e+02 5.80e+02 9.75e+02 9.70e+01 3.34e+02 5.31e+02
 9.11e+02 3.59e+02 1.12e+02 9.12e+02 8.35e+02 3.54e+02 6.55e+02 5.52e+02
 7.75e+02 9.40e+01 3.69e+02 6.25e+02 1.74e+02 5.20e+02 9.74e+02 2.63e+02
 5.06e+02 2.50e+01 2.21e+02 1.60e+01 4.81e+02 6.68e+02 6.72e+02 7.23e+02
 5.63e+02 8.67e+02 6.64e+02 6.93e+02 3.11e+02 7.30e+01 3.45e+02 4.75e+02
 1.45e+02 4.05e+02 7.54e+02 3.57e+02 3.80e+02 5.79e+02 8.11e+02 6.00e+00
 1.75e+02 2.26e+02 5.70e+01 7.51e+02 4.51e+02 7.67e+02 6.26e+02 4.60e+02
 7.16e+02 4.88e+02 2.88e+02 8.64e+02 2.95e+02 6.42e+02 9.44e+02 3.00e+02
 4.85e+02 2.04e+02 3.79e+02 8.96e+02 2.30e+02 2.06e+02 4.80e+02 9.06e+02
 3.08e+02 6.52e+02 5.69e+02 9.47e+02 5.07e+02 6.80e+01 7.78e+02 1.96e+02
 4.37e+02 2.80e+02 2.20e+01 3.61e+02 4.40e+02 3.18e+02 5.43e+02 5.12e+02
 3.49e+02 8.39e+02 5.29e+02 5.19e+02 4.79e+02 6.22e+02 4.55e+02 5.80e+01
 6.76e+02 6.50e+02 3.48e+02 1.94e+02 6.39e+02 7.52e+02 4.58e+02 7.29e+02
 1.86e+02 6.65e+02 6.40e+02 7.20e+01 2.43e+02 3.40e+01 5.94e+02 3.52e+02
 5.18e+02 2.87e+02 4.60e+01 1.77e+02 5.34e+02 5.65e+02 8.89e+02 4.20e+02
 4.30e+01 7.53e+02 3.03e+02 3.04e+02 9.31e+02 6.34e+02 2.10e+01 7.50e+01
 7.90e+01 7.28e+02 5.93e+02 5.97e+02 2.85e+02 8.78e+02 8.28e+02 4.17e+02
 6.51e+02 5.84e+02 7.01e+02 2.77e+02 2.93e+02 8.74e+02 6.60e+02 1.32e+02
 4.77e+02 1.44e+02 4.14e+02 6.79e+02 9.35e+02 3.68e+02 7.06e+02 3.23e+02
 1.58e+02 7.17e+02 3.90e+01 5.00e+00 4.01e+02 5.11e+02 7.64e+02 3.98e+02
 7.30e+02 7.95e+02 3.17e+02 6.00e+02 8.97e+02 5.92e+02 3.38e+02 8.30e+02
 2.37e+02 2.35e+02 2.61e+02 8.21e+02 4.82e+02 2.30e+01 3.02e+02 6.77e+02
 3.19e+02 3.64e+02 2.55e+02 6.28e+02 7.50e+02 5.14e+02 5.48e+02 6.31e+02
 7.34e+02 5.01e+02 4.04e+02 1.67e+02 2.52e+02 5.10e+02 1.90e+01 1.00e+01
 6.10e+01 8.65e+02 5.00e+01 1.14e+02 5.00e+02 7.89e+02 8.93e+02 6.43e+02
 6.78e+02 2.25e+02 5.47e+02 7.49e+02 4.91e+02 2.92e+02 3.95e+02 2.75e+02
 3.65e+02 5.20e+01 9.96e+02 9.60e+01 5.40e+01 2.96e+02 1.27e+02 2.51e+02
 3.36e+02 8.10e+01 4.90e+01 9.23e+02 2.57e+02 4.00e+00 6.33e+02 1.19e+02
 3.89e+02 4.18e+02 8.23e+02 2.11e+02 4.73e+02 2.14e+02 2.23e+02 6.40e+01
 4.40e+01 3.53e+02 6.74e+02 6.16e+02 6.85e+02 3.51e+02 5.72e+02 7.62e+02
 1.71e+02 6.13e+02 2.59e+02 2.19e+02 4.72e+02 2.00e+00 4.83e+02 5.96e+02
 1.00e+02 7.85e+02 4.78e+02 4.94e+02 3.77e+02 1.08e+02 7.36e+02 7.24e+02
 3.50e+01 6.44e+02 1.33e+02 4.50e+02 5.10e+01 5.38e+02 2.78e+02 9.45e+02
 8.00e+01 4.26e+02 1.90e+02 6.46e+02 2.48e+02 3.63e+02 3.74e+02 2.29e+02
 9.73e+02 6.30e+01 5.49e+02 3.44e+02 6.63e+02 1.36e+02 1.34e+02 4.48e+02
 1.15e+02 4.50e+01 8.16e+02 2.39e+02 1.10e+01 1.78e+02 5.04e+02 7.96e+02
 5.73e+02 3.55e+02 8.30e+01 1.42e+02 2.45e+02 3.20e+02 2.40e+01 6.70e+02
 1.25e+02 9.00e+01 7.81e+02 8.02e+02 9.46e+02 3.60e+02 5.45e+02 7.86e+02
 9.40e+02 9.66e+02 8.37e+02 3.70e+01 2.86e+02 4.24e+02 4.68e+02 3.42e+02
 3.92e+02 2.73e+02 4.38e+02 6.90e+01 4.31e+02 9.00e+02 7.98e+02 8.43e+02
 1.87e+02 7.00e+02 3.13e+02 3.73e+02 8.90e+01 2.50e+02 4.70e+01 3.25e+02
 9.02e+02 4.53e+02 8.99e+02 2.28e+02 3.00e+00 1.37e+02 1.70e+04 5.28e+02
 1.55e+02 2.60e+01 1.51e+02 7.11e+02 1.98e+02 1.64e+02 6.62e+02 1.88e+02
 8.87e+02 9.17e+02 8.72e+02 3.67e+02 7.42e+02 9.60e+02 1.99e+02 7.43e+02
 9.77e+02 9.80e+01 3.76e+02 3.47e+02 7.00e+01 3.99e+02 9.20e+01 3.09e+02
 4.20e+01 4.92e+02 3.10e+01 6.11e+02 1.97e+02 2.54e+02 3.82e+02 8.80e+01
 2.34e+02 8.60e+01 2.66e+02 2.90e+02 2.16e+02 4.06e+02 1.49e+02 2.20e+02
 8.42e+02 6.05e+02 8.50e+01 4.87e+02 8.61e+02 1.09e+02 8.51e+02 3.32e+02
 1.01e+02 9.49e+02 7.55e+02 5.87e+02 2.70e+02 7.56e+02 8.06e+02 4.70e+02
 9.42e+02 9.05e+02 8.01e+02 4.99e+02 1.53e+02 7.83e+02 9.26e+02 1.81e+02
 7.25e+02 9.00e+00 5.30e+01 3.06e+02 6.30e+02 3.56e+02 8.36e+02 2.70e+01
 9.85e+02 4.57e+02 5.55e+02 6.06e+02 1.24e+02 3.71e+02 2.38e+02 9.22e+02
 3.20e+01 7.57e+02 9.20e+02 1.70e+02 5.98e+02 8.49e+02 4.86e+02 5.83e+02
 4.28e+02 5.78e+02 3.16e+02 8.88e+02 2.47e+02 3.05e+02 6.07e+02 7.31e+02
 6.49e+02 6.50e+01 6.29e+02 7.82e+02 2.74e+02 2.24e+02 3.86e+02 2.89e+02
 6.69e+02 1.21e+02 1.76e+02 4.96e+02 6.54e+02 1.69e+02 3.78e+02 9.30e+01
 8.29e+02 4.35e+02 1.47e+02 7.93e+02 1.22e+02 6.96e+02 1.92e+02 1.46e+02
 4.11e+02 6.32e+02 6.21e+02 6.47e+02 5.32e+02 8.60e+02 9.01e+02 1.39e+02
 8.04e+02 5.36e+02 8.69e+02 4.44e+02 6.88e+02 1.31e+02 8.85e+02 8.75e+02
 1.26e+02 7.70e+02 2.00e+02 5.13e+02 8.62e+02 9.08e+02 6.73e+02 2.91e+02
 7.10e+02 2.09e+02 3.30e+01 1.89e+02 3.91e+02 8.90e+02 7.60e+01 4.10e+02
 4.93e+02 6.89e+02 1.80e+02 3.81e+02 5.46e+02 1.52e+02 4.19e+02 6.94e+02
 7.04e+02 6.57e+02 4.98e+02 2.05e+02 7.97e+02 5.56e+02 2.76e+02 6.86e+02
 6.01e+02 3.39e+02 4.25e+02 7.12e+02 1.43e+02 6.82e+02 7.39e+02 9.69e+02
 4.49e+02 4.69e+02 3.31e+02 4.54e+02 3.15e+02 6.98e+02 6.08e+02 1.66e+02
 3.33e+02 3.12e+02 1.28e+02 7.72e+02 8.91e+02 3.96e+02 6.59e+02 6.20e+02
 3.21e+02 1.85e+02 1.56e+02]

actor_2_name:
	- Total de datos únicos: 3033
	- Valores: ['Joel David Moore' 'Orlando Bloom' 'Rory Kinnear' ... 'Valorie Curry'
 'Maxwell Moody' 'Brian Herzlinger']

actor_1_facebook_likes:
	- Total de datos únicos: 879
	- Valores: [1.00e+03 4.00e+04 1.10e+04 2.70e+04 1.31e+02 6.40e+02 2.40e+04 7.99e+02
 2.60e+04 2.50e+04 1.50e+04 1.80e+04 4.51e+02 2.20e+04 1.00e+04 5.00e+03
 8.91e+02 1.60e+04 6.00e+03 2.90e+04 2.10e+04 1.40e+04 3.00e+03 8.83e+02
 2.00e+04 1.20e+04 8.94e+02 9.74e+02 4.40e+04 2.30e+04 1.70e+04 3.40e+04
 1.90e+04 9.79e+02 2.75e+02 2.00e+03 9.98e+02 2.68e+02 8.70e+04 7.11e+02
 6.22e+02 4.00e+03 7.56e+02 9.75e+02 8.90e+02 6.48e+02 5.44e+02 5.31e+02
 6.62e+02 4.90e+04 3.09e+02 2.34e+02 7.30e+02 9.21e+02 8.51e+02 7.69e+02
 7.83e+02 1.30e+04 9.57e+02 8.20e+02 9.86e+02 7.66e+02 6.13e+02 9.82e+02
 5.35e+02 6.88e+02 6.05e+02 8.45e+02 9.20e+02 7.84e+02 7.74e+02 8.86e+02
 6.90e+02 2.73e+02 9.36e+02 8.33e+02 3.90e+01 6.50e+02 5.96e+02 8.11e+02
 4.80e+02 6.69e+02 3.83e+02 6.81e+02 6.73e+02 7.60e+02 8.52e+02 5.00e+00
 7.52e+02 7.80e+02 5.58e+02 9.25e+02 6.60e+02 8.27e+02 8.00e+03 4.90e+02
 1.44e+02 6.23e+02 9.62e+02 8.73e+02 6.93e+02 7.70e+02 1.91e+02 8.35e+02
 3.50e+04 6.91e+02 5.91e+02 1.92e+02 9.19e+02 6.80e+02 8.87e+02 2.83e+02
 6.70e+02 8.79e+02 5.84e+02 6.72e+02 8.13e+02 6.11e+02 4.09e+02 8.75e+02
 5.77e+02 8.98e+02 3.40e+02 9.00e+03 2.30e+01 9.81e+02 8.82e+02 7.43e+02
 5.29e+02 2.00e+00 5.63e+02 6.70e+01 7.95e+02 1.63e+02 7.00e+03 6.10e+02
 7.19e+02 8.67e+02 9.12e+02 3.74e+02 7.45e+02 5.82e+02 9.33e+02 5.48e+02
 9.06e+02 9.40e+02 2.10e+01 7.10e+02 6.92e+02 8.70e+02 9.60e+02 4.43e+02
 9.03e+02 9.89e+02 3.94e+02 3.49e+02 8.44e+02 5.76e+02 6.14e+02 9.66e+02
 7.88e+02 5.37e+02 9.95e+02 9.01e+02 8.48e+02 6.00e+02 5.09e+02 3.24e+02
 9.63e+02 1.77e+02 6.45e+02 9.67e+02 5.24e+02 4.60e+04 4.19e+02 9.88e+02
 4.33e+02 6.58e+02 8.81e+02 8.54e+02 3.66e+02 8.34e+02 7.32e+02 7.89e+02
 5.39e+02 8.74e+02 3.30e+04 3.10e+02 4.36e+02 5.10e+02 2.87e+02 8.65e+02
 7.22e+02 9.68e+02 8.89e+02 9.73e+02 7.94e+02 9.70e+02 1.42e+02 9.31e+02
 9.26e+02 8.47e+02 4.85e+02 5.34e+02 5.70e+01 8.38e+02 9.39e+02 9.56e+02
 2.26e+02 6.17e+02 2.77e+02 5.93e+02 1.17e+02 6.31e+02 1.34e+02 2.11e+02
 9.61e+02 6.77e+02 1.45e+02 6.25e+02 4.26e+02 1.54e+02 4.16e+02 4.95e+02
 9.92e+02 4.37e+02 2.94e+02 6.27e+02 3.30e+02 3.25e+02 4.50e+04 4.30e+02
 6.29e+02 5.10e+01 9.72e+02 9.84e+02 8.41e+02 5.99e+02 5.98e+02 7.79e+02
 1.13e+02 6.78e+02 2.90e+01 9.24e+02 8.07e+02 5.45e+02 8.21e+02 5.21e+02
 7.49e+02 5.50e+02 4.27e+02 9.13e+02 9.05e+02 7.23e+02 2.67e+02 1.64e+02
 8.49e+02 2.19e+02 4.40e+01 4.05e+02 5.79e+02 4.68e+02 6.83e+02 9.22e+02
 6.24e+02 3.26e+02 6.87e+02 1.07e+02 7.34e+02 6.68e+02 9.70e+01 8.55e+02
 9.04e+02 7.00e+02 1.64e+05 5.78e+02 8.36e+02 7.08e+02 3.87e+02 9.54e+02
 8.80e+02 4.72e+02 1.76e+02 7.20e+02 7.03e+02 7.87e+02 2.93e+02 6.07e+02
 9.64e+02 3.27e+02 8.77e+02 9.78e+02 6.36e+02 7.82e+02 9.76e+02 6.10e+01
 8.84e+02 4.99e+02 2.18e+02 1.52e+02 7.47e+02 7.55e+02 5.92e+02 6.43e+02
 7.60e+01 3.08e+02 6.28e+02 3.70e+01 3.78e+02 7.86e+02 7.40e+02 4.63e+02
 5.51e+02 1.80e+02 4.96e+02 7.46e+02 6.85e+02 9.44e+02 5.41e+02 4.61e+02
 8.08e+02 7.68e+02 7.78e+02 4.92e+02 9.47e+02 9.00e+02 8.12e+02 8.26e+02
 9.69e+02 8.97e+02 2.88e+02 3.96e+02 4.46e+02 7.75e+02 9.07e+02 4.65e+02
 5.53e+02 6.94e+02 6.35e+02 6.95e+02 8.37e+02 3.00e+00 3.00e+01 6.21e+02
 5.54e+02 3.72e+02 6.00e+00 8.18e+02 1.65e+02 7.73e+02 9.18e+02 3.44e+02
 7.16e+02 6.49e+02 6.38e+02 8.39e+02 2.44e+02 6.16e+02 9.71e+02 4.42e+02
 3.03e+02 5.06e+02 5.67e+02 6.96e+02 4.55e+02 5.85e+02 8.69e+02 5.81e+02
 1.57e+02 6.63e+02 7.59e+02 4.89e+02 4.73e+02 4.48e+02 1.47e+02 8.76e+02
 2.95e+02 1.14e+02 5.23e+02 7.57e+02 9.55e+02 5.04e+02 5.59e+02 6.39e+02
 1.73e+02 1.10e+01 6.64e+02 5.80e+02 9.43e+02 7.31e+02 4.22e+02 4.97e+02
 6.30e+01 8.06e+02 7.38e+02 5.94e+02 3.58e+02 3.86e+02 9.53e+02 6.60e+01
 4.91e+02 7.48e+02 3.85e+02 3.32e+02 7.71e+02 6.42e+02 2.60e+02 6.55e+02
 9.34e+02 2.64e+02 4.40e+02 7.13e+02 6.40e+05 7.42e+02 2.72e+02 7.44e+02
 2.96e+02 3.28e+02 6.01e+02 2.85e+02 4.69e+02 4.60e+02 9.00e+01 6.18e+02
 4.50e+01 4.62e+02 9.02e+02 7.98e+02 3.55e+02 9.85e+02 3.38e+02 5.62e+02
 4.00e+02 1.72e+02 1.81e+02 9.41e+02 1.74e+02 8.43e+02 9.27e+02 9.08e+02
 2.79e+02 7.90e+01 8.93e+02 5.12e+02 7.21e+02 5.27e+02 2.01e+02 8.05e+02
 5.56e+02 1.29e+02 3.29e+02 8.96e+02 3.90e+02 1.50e+02 2.40e+02 2.03e+02
 8.09e+02 2.00e+01 5.33e+02 9.80e+01 6.08e+02 5.26e+02 3.04e+02 5.08e+02
 5.65e+02 8.60e+02 3.10e+04 1.41e+02 7.41e+02 1.25e+02 6.46e+02 0.00e+00
 6.06e+02 9.91e+02 9.09e+02 6.51e+02 3.02e+02 1.37e+05 3.46e+02 8.56e+02
 2.04e+02 9.97e+02 8.85e+02 8.29e+02 5.72e+02 8.01e+02 4.88e+02 6.99e+02
 4.60e+01 1.06e+02 4.03e+02 4.83e+02 1.09e+02 7.29e+02 4.86e+02 8.88e+02
 5.00e+02 3.41e+02 1.70e+01 9.48e+02 6.54e+02 8.20e+01 7.27e+02 8.28e+02
 1.55e+02 9.46e+02 5.17e+02 2.10e+02 7.12e+02 4.56e+02 5.97e+02 3.64e+02
 8.40e+01 4.44e+02 9.23e+02 1.58e+02 8.66e+02 5.49e+02 4.82e+02 4.53e+02
 7.15e+02 7.67e+02 6.34e+02 2.14e+02 7.14e+02 2.27e+02 9.96e+02 2.41e+02
 9.11e+02 8.61e+02 6.33e+02 1.75e+02 5.55e+02 5.03e+02 2.32e+02 2.35e+02
 1.03e+02 9.40e+01 8.16e+02 1.16e+02 1.93e+02 4.71e+02 3.54e+02 7.96e+02
 5.95e+02 6.50e+01 4.77e+02 8.57e+02 1.36e+02 6.40e+01 8.25e+02 7.76e+02
 7.35e+02 1.79e+02 4.67e+02 8.92e+02 7.06e+02 7.18e+02 8.78e+02 4.13e+02
 5.43e+02 1.59e+02 4.23e+02 1.49e+02 4.12e+02 3.80e+02 6.97e+02 7.00e+01
 5.69e+02 6.56e+02 1.30e+02 6.19e+02 5.71e+02 8.99e+02 6.59e+02 8.03e+02
 5.52e+02 3.06e+02 1.70e+02 7.70e+01 7.17e+02 9.30e+01 2.80e+01 4.52e+02
 3.47e+02 7.33e+02 3.62e+02 5.25e+02 6.00e+01 7.97e+02 9.00e+00 6.86e+02
 5.20e+01 3.99e+02 1.50e+01 2.15e+02 7.85e+02 2.76e+02 4.25e+02 9.17e+02
 6.65e+02 3.81e+02 5.64e+02 6.80e+01 4.94e+02 7.26e+02 3.71e+02 9.29e+02
 2.20e+02 2.58e+02 7.72e+02 2.40e+01 7.53e+02 8.23e+02 4.02e+02 7.54e+02
 5.30e+02 3.07e+02 9.90e+02 6.03e+02 4.78e+02 7.80e+01 4.39e+02 1.22e+02
 2.98e+02 3.00e+04 1.83e+02 3.98e+02 7.64e+02 5.57e+02 1.48e+02 1.87e+02
 3.11e+02 3.92e+02 7.02e+02 7.00e+00 1.86e+02 6.79e+02 1.85e+02 5.73e+02
 9.49e+02 3.50e+01 7.36e+02 2.05e+02 2.17e+02 1.33e+02 3.89e+02 4.49e+02
 3.63e+02 4.84e+02 6.66e+02 4.34e+02 9.77e+02 5.02e+02 3.73e+02 7.10e+01
 2.62e+02 5.32e+02 2.25e+02 5.13e+02 6.37e+02 2.46e+02 7.61e+02 2.06e+02
 4.74e+02 4.00e+00 9.10e+01 9.90e+01 2.70e+01 8.04e+02 4.35e+02 5.60e+01
 4.31e+02 1.80e+01 3.82e+02 3.21e+02 9.37e+02 2.63e+02 3.97e+02 6.82e+02
 3.91e+02 3.43e+02 2.29e+02 2.50e+01 3.68e+02 3.10e+01 2.86e+02 3.93e+02
 2.08e+02 9.35e+02 2.69e+02 5.20e+02 8.59e+02 5.50e+01 7.50e+01 1.97e+02
 4.32e+02 3.31e+02 5.70e+02 2.37e+02 7.40e+01 1.88e+02 5.89e+02 2.74e+02
 8.50e+01 4.64e+02 1.10e+02 7.07e+02 2.30e+02 5.22e+02 8.62e+02 4.20e+02
 4.80e+01 9.45e+02 2.54e+02 7.62e+02 2.23e+02 7.63e+02 1.66e+02 5.07e+02
 4.14e+02 1.19e+02 8.90e+01 5.75e+02 2.55e+02 1.95e+02 5.60e+02 6.32e+02
 2.80e+02 4.38e+02 9.42e+02 2.53e+02 8.80e+01 5.16e+02 1.40e+02 6.41e+02
 6.74e+02 5.74e+02 5.36e+02 4.00e+01 2.89e+02 4.87e+02 2.31e+02 6.52e+02
 3.19e+02 2.16e+02 4.47e+02 1.05e+02 4.81e+02 1.23e+02 8.71e+02 5.00e+01
 3.34e+02 1.27e+02 5.86e+02 6.75e+02 1.60e+01 7.20e+01 4.20e+01 6.76e+02
 1.28e+02 1.20e+02 8.46e+02 3.14e+02 5.38e+02 2.51e+02 2.65e+02 4.59e+02
 1.78e+02 1.68e+02 3.88e+02 1.89e+02 3.30e+01 3.53e+02 1.60e+02 1.21e+02
 1.61e+02 4.10e+01 6.47e+02 7.01e+02 8.60e+01 2.81e+02 5.30e+01 4.21e+02
 2.60e+05 3.80e+01 1.99e+02 5.01e+02 8.70e+01 2.82e+02      nan 3.40e+01
 4.98e+02 5.11e+02 1.40e+01 1.20e+01 2.66e+02 4.30e+01 3.56e+02 2.02e+02
 6.12e+02 2.45e+02 4.06e+02 1.02e+02 7.24e+02 5.90e+02 9.60e+01 4.58e+02
 2.24e+02 3.35e+02 9.20e+01 6.57e+02 9.38e+02 2.84e+02 4.18e+02 4.90e+01
 2.47e+02 3.12e+02 1.08e+02 8.00e+01 6.02e+02 5.87e+02 4.07e+02 1.96e+02
 5.40e+01 6.90e+01 9.80e+02 3.42e+02 2.39e+02 3.22e+02 4.66e+02 8.00e+00
 2.20e+01 3.61e+02 5.80e+01 5.90e+01 1.69e+02 4.70e+02 6.15e+02 3.59e+02
 4.70e+01 1.90e+01 3.75e+02 3.18e+02 6.44e+02 1.56e+02 3.60e+02 2.50e+02
 7.93e+02 1.00e+02 1.35e+02 3.37e+02 2.36e+02 8.10e+01 8.14e+02 7.28e+02
 7.70e+04 2.59e+02 1.38e+02 5.05e+02 3.60e+01 2.70e+02 3.76e+02 2.00e+02
 8.30e+02 1.26e+02 1.43e+02 2.38e+02 1.18e+02 1.11e+02 2.60e+01 7.25e+02
 2.52e+02 3.20e+01 3.13e+02 3.70e+02 6.30e+02 1.00e+01 2.91e+02]

gross:
	- Total de datos únicos: 4036
	- Valores: [7.60505847e+08 3.09404152e+08 2.00074175e+08 ... 4.58400000e+03
 1.04430000e+04 8.52220000e+04]

genres:
	- Total de datos únicos: 914
	- Valores: ['Action|Adventure|Fantasy|Sci-Fi' 'Action|Adventure|Fantasy'
 'Action|Adventure|Thriller' 'Action|Thriller' 'Documentary'
 'Action|Adventure|Sci-Fi' 'Action|Adventure|Romance'
 'Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance'
 'Adventure|Family|Fantasy|Mystery' 'Action|Adventure'
 'Action|Adventure|Western' 'Action|Adventure|Family|Fantasy'
 'Action|Adventure|Comedy|Family|Fantasy|Sci-Fi' 'Adventure|Fantasy'
 'Action|Adventure|Drama|History' 'Adventure|Family|Fantasy'
 'Action|Adventure|Drama|Romance' 'Drama|Romance'
 'Action|Adventure|Sci-Fi|Thriller' 'Action|Adventure|Fantasy|Romance'
 'Action|Adventure|Fantasy|Sci-Fi|Thriller'
 'Adventure|Animation|Comedy|Family|Fantasy'
 'Adventure|Animation|Comedy|Family|Sport' 'Action|Crime|Thriller'
 'Action|Adventure|Horror|Sci-Fi|Thriller'
 'Adventure|Animation|Family|Sci-Fi' 'Action|Comedy|Crime|Thriller'
 'Animation|Drama|Family|Fantasy' 'Action|Crime|Drama|Thriller'
 'Adventure|Animation|Comedy|Family'
 'Action|Adventure|Animation|Comedy|Family|Sci-Fi'
 'Adventure|Drama|Family|Mystery' 'Action|Comedy|Sci-Fi|Western'
 'Action|Adventure|Fantasy|Horror|Thriller'
 'Action|Adventure|Comedy|Sci-Fi' 'Comedy|Family|Fantasy'
 'Adventure|Animation|Comedy|Drama|Family|Fantasy'
 'Adventure|Drama|Family|Fantasy' 'Action|Adventure|Drama|Fantasy'
 'Action|Adventure|Family|Fantasy|Romance' 'Action|Adventure|Drama|Sci-Fi'
 'Action|Adventure|Romance|Sci-Fi'
 'Action|Adventure|Family|Mystery|Sci-Fi'
 'Action|Adventure|Animation|Comedy|Drama|Family|Sci-Fi'
 'Adventure|Animation|Comedy|Family|Sci-Fi'
 'Adventure|Animation|Family|Fantasy' 'Action|Sci-Fi'
 'Adventure|Drama|Sci-Fi' 'Action|Adventure|Drama|Horror|Sci-Fi'
 'Drama|Fantasy|Romance' 'Adventure|Sci-Fi'
 'Action|Adventure|Drama|Thriller' 'Action|Drama|History|Romance|War'
 'Action|Adventure|Biography|Drama|History|Romance|War' 'Action|Drama'
 'Drama|Horror|Sci-Fi' 'Adventure|Comedy|Family|Fantasy'
 'Animation|Comedy|Family|Fantasy'
 'Action|Adventure|Animation|Comedy|Family'
 'Adventure|Animation|Comedy|Family|Fantasy|Musical' 'Mystery|Thriller'
 'Adventure|Animation|Comedy|Drama|Family'
 'Action|Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi'
 'Comedy|Fantasy|Horror' 'Drama|Fantasy|Horror|Thriller'
 'Action|Drama|Thriller' 'Adventure' 'Action|Comedy|Fantasy|Sci-Fi'
 'Action|Adventure|Comedy|Family|Fantasy|Mystery|Sci-Fi'
 'Action|Adventure|Animation|Fantasy' 'Comedy|Crime'
 'Action|Drama|History|War' 'Action|Adventure|Drama'
 'Action|Adventure|Animation|Comedy|Family|Fantasy'
 'Action|Drama|Mystery|Sci-Fi' 'Action|Adventure|Comedy|Thriller'
 'Action|Adventure|Animation|Fantasy|Romance|Sci-Fi'
 'Action|Adventure|Drama|History|War' 'Adventure|Drama|Fantasy|Romance'
 'Animation|Comedy|Family|Musical' 'Action|Crime|Drama|Mystery|Thriller'
 'Adventure|Drama|Thriller|Western'
 'Adventure|Animation|Comedy|Family|Western' 'Action|Mystery|Thriller'
 'Adventure|Sci-Fi|Thriller'
 'Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi'
 'Action|Crime|Mystery|Thriller' 'Action|Adventure|Family|Mystery'
 'Adventure|Drama|Romance|War' 'Adventure|Animation|Family|Thriller'
 'Action|Fantasy' 'Action|Animation|Comedy|Family|Sci-Fi'
 'Action|Comedy|Fantasy' 'Fantasy'
 'Adventure|Animation|Comedy|Family|Musical'
 'Action|Adventure|Crime|Mystery|Thriller' 'Action|Adventure|History'
 'Action' 'Adventure|Drama|Fantasy' 'Action|Fantasy|Thriller'
 'Action|Adventure|Comedy|Crime' 'Adventure|Mystery|Sci-Fi'
 'Action|Drama|Sci-Fi|Thriller' 'Action|Crime|Sci-Fi|Thriller'
 'Action|Family|Sport' 'Comedy|Drama|Romance' 'Action|Comedy|Romance'
 'Action|Adventure|Mystery|Sci-Fi' 'Action|Drama|War'
 'Adventure|Drama|Sci-Fi|Thriller'
 'Action|Adventure|Comedy|Family|Fantasy' 'Crime|Thriller'
 'Action|Comedy|Crime|Romance|Thriller' 'Biography|Drama'
 'Action|Comedy|Crime|Sci-Fi|Thriller' 'Action|Adventure|Crime'
 'Action|Drama|Fantasy|War' 'Animation|Comedy|Family|Music|Western'
 'Action|Adventure|Mystery|Sci-Fi|Thriller' 'Action|Drama|Sci-Fi|Sport'
 'Action|Crime|Romance|Thriller' 'Action|Adventure|Comedy'
 'Biography|Drama|Sport' 'Action|Mystery|Sci-Fi|Thriller'
 'Animation|Family|Fantasy|Musical|Romance' 'Comedy'
 'Action|Adventure|Romance|Sci-Fi|Thriller' 'Comedy|Romance'
 'Action|Drama|Romance' 'Biography|Crime|Drama|History|Romance'
 'Biography|Crime|Drama' 'Action|Comedy|Thriller' 'Action|Comedy|Crime'
 'Action|Drama|Mystery|Thriller' 'Drama|Western'
 'Animation|Drama|Family|Musical|Romance'
 'Action|Adventure|Comedy|Family|Mystery' 'Action|Romance|Thriller'
 'Action|Fantasy|Horror|Mystery' 'Adventure|Drama|Thriller'
 'Biography|Comedy|Crime|Drama' 'Action|Sci-Fi|War' 'Drama|Sci-Fi'
 'Action|Adventure|Animation|Family|Fantasy'
 'Action|Crime|Fantasy|Romance|Thriller' 'Adventure|Comedy|Sci-Fi'
 'Action|Crime|Sport|Thriller'
 'Action|Adventure|Biography|Drama|History|Thriller'
 'Action|Comedy|Sci-Fi' 'Action|Drama|Thriller|War'
 'Drama|Mystery|Thriller' 'Action|Adventure|Fantasy|Thriller'
 'Crime|Drama' 'Drama|History|Romance|War' 'Animation|Comedy|Family|Sport'
 'Comedy|Sci-Fi|Thriller' 'Drama|History|War'
 'Adventure|Animation|Comedy|Family|Romance'
 'Drama|Family|Fantasy|Romance' 'Drama|Fantasy|Thriller'
 'Drama|Mystery|Romance|Sci-Fi|Thriller' 'Drama|History|War|Western'
 'Action|Adventure|Animation|Family'
 'Adventure|Comedy|Family|Mystery|Sci-Fi'
 'Drama|Fantasy|Horror|Mystery|Thriller' 'Animation|Comedy|Family|Sci-Fi'
 'Adventure|Comedy|Drama|Fantasy|Romance'
 'Action|Adventure|Comedy|Crime|Thriller' 'Crime|Drama|Thriller'
 'Adventure|Animation|Family|Fantasy|Musical|War' 'Action|Comedy'
 'Crime|Drama|Mystery|Thriller' 'Adventure|Drama|History'
 'Action|Adventure|Animation|Family|Fantasy|Sci-Fi'
 'Adventure|Animation|Comedy|Family|Fantasy|Music'
 'Drama|History|Thriller|War' 'Action|Animation|Comedy|Sci-Fi'
 'Comedy|Family|Fantasy|Horror|Mystery' 'Drama|Mystery|Sci-Fi|Thriller'
 'Action|Horror|Sci-Fi|Thriller' 'Crime|Mystery|Thriller'
 'Action|Adventure|Comedy|Crime|Mystery|Thriller' 'Comedy|Drama|Sci-Fi'
 'Action|Family|Fantasy|Musical' 'Drama|History|Sport'
 'Adventure|Drama|Romance' 'Animation|Comedy|Family|Music|Romance'
 'Animation|Comedy|Family|Fantasy|Musical|Romance'
 'Crime|Drama|Horror|Mystery|Thriller' 'Adventure|Comedy|Family'
 'Action|Adventure|Comedy|Fantasy' 'Comedy|Drama|Music|Musical'
 'Adventure|Comedy|Drama|Family|Fantasy' 'Action|Comedy|Fantasy|Romance'
 'Comedy|Romance|Sci-Fi' 'Adventure|Comedy|Mystery'
 'Comedy|Drama|Fantasy|Romance' 'Action|Comedy|Family|Fantasy'
 'Action|Adventure|Fantasy|Horror|Sci-Fi'
 'Crime|Drama|History|Mystery|Thriller' 'Comedy|Drama'
 'Adventure|Animation|Comedy|Drama|Family|Fantasy|Sci-Fi'
 'Action|Drama|Romance|Sci-Fi|Thriller' 'Comedy|Crime|Sport'
 'Comedy|Family|Fantasy|Romance'
 'Action|Adventure|Crime|Drama|Sci-Fi|Thriller'
 'Adventure|Drama|History|Romance|War' 'Comedy|Family|Sci-Fi'
 'Fantasy|Horror|Mystery|Thriller'
 'Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi|Sport'
 'Adventure|Comedy|Crime|Family|Mystery' 'Drama|Sci-Fi|Thriller'
 'Action|Crime|Mystery|Romance|Thriller' 'Action|Adventure|Comedy|Romance'
 'Adventure|Animation|Family|Western' 'Comedy|Family|Romance'
 'Action|Adventure|Family|Sci-Fi|Thriller'
 'Animation|Family|Fantasy|Music'
 'Action|Adventure|Family|Fantasy|Thriller' 'Comedy|Fantasy'
 'Action|Adventure|Comedy|Fantasy|Thriller' 'Drama|Horror|Mystery|Sci-Fi'
 'Action|Sci-Fi|Thriller' 'Drama|History|Thriller'
 'Adventure|Animation|Family' 'Drama|Musical|Romance' 'Documentary|Drama'
 'Action|Adventure|Drama|History|Romance' 'Animation|Family'
 'Adventure|Animation|Drama|Family|Musical'
 'Animation|Comedy|Family|Fantasy|Sci-Fi'
 'Adventure|Animation|Drama|Family|Fantasy' 'Sci-Fi|Thriller'
 'Animation|Comedy|Family' 'Action|Crime|Fantasy|Thriller'
 'Comedy|Drama|Family|Music|Musical|Romance' 'Horror|Mystery|Thriller'
 'Action|Adventure|Comedy|Family|Sci-Fi' 'Comedy|Family'
 'Biography|Comedy|Drama|History' 'Drama|Music|Musical'
 'Crime|Drama|Mystery' 'Comedy|Crime|Music'
 'Action|Comedy|Romance|Thriller'
 'Animation|Comedy|Family|Fantasy|Mystery' 'Comedy|Crime|Drama|Romance'
 'Action|Adventure|Romance|Thriller' 'Drama|History|Romance'
 'Action|Drama|Fantasy|Romance' 'Action|Adventure|Animation|Family|Sci-Fi'
 'Action|Drama|Sci-Fi' 'Drama|Horror|Sci-Fi|Thriller'
 'Animation|Comedy|Fantasy' 'Action|Animation|Comedy|Family'
 'Action|Adventure|Comedy|Romance|Thriller' 'Action|Comedy|Sport'
 'Biography|Drama|History|War' 'Adventure|Animation|Comedy'
 'Action|Drama|Sport' 'Adventure|Drama|Family'
 'Drama|Mystery|Romance|Thriller'
 'Adventure|Animation|Comedy|Family|Fantasy|Romance' 'Adventure|Drama|War'
 'Action|Adventure|Crime|Thriller'
 'Adventure|Drama|Fantasy|Mystery|Thriller'
 'Fantasy|Mystery|Romance|Sci-Fi|Thriller'
 'Drama|Fantasy|Mystery|Thriller' 'Animation|Comedy|Family|Fantasy|Music'
 'Drama|Horror|Romance|Thriller' 'Drama|War' 'Drama'
 'Action|Drama|Fantasy|Horror|War' 'Adventure|Family|Fantasy|Romance'
 'Adventure|Biography|Drama|History|War' 'Action|Adventure|Horror|Sci-Fi'
 'Action|Fantasy|Horror' 'Comedy|Drama|Musical|Romance'
 'Action|Sci-Fi|Sport'
 'Action|Adventure|Animation|Comedy|Crime|Family|Fantasy'
 'Adventure|Animation|Family|Fantasy|Musical'
 'Action|Crime|Mystery|Sci-Fi|Thriller'
 'Action|Comedy|Crime|Drama|Thriller' 'Adventure|Drama|History|Romance'
 'Biography|Drama|Thriller' 'Action|Drama|History|Thriller'
 'Action|Adventure|Fantasy|War' 'Comedy|Fantasy|Romance'
 'Action|Adventure|Comedy|Romance|Thriller|Western'
 'Biography|Drama|Sport|War' 'Comedy|Drama|Family|Musical'
 'Action|Adventure|Fantasy|Horror|Sci-Fi|Thriller' 'Drama|Sport'
 'Action|Fantasy|Sci-Fi|Thriller' 'Drama|Mystery|Romance'
 'Adventure|Biography|Drama|History|Sport|Thriller' 'Crime|Drama|Fantasy'
 'Adventure|Biography|Crime|Drama|Western' 'Action|War'
 'Comedy|Romance|Sport' 'Crime|Drama|Mystery|Thriller|Western'
 'Comedy|Sport' 'Comedy|Drama|Family' 'Crime|Drama|Fantasy|Mystery'
 'Adventure|Animation|Biography|Drama|Family|Fantasy|Musical'
 'Drama|Romance|Western' 'Documentary|Music' 'Drama|Thriller'
 'Animation|Family|Fantasy' 'Action|Fantasy|Horror|Sci-Fi'
 'Biography|Comedy|Drama' 'Action|Horror|Sci-Fi' 'Adventure|Comedy'
 'Biography|Drama|History|Sport' 'Comedy|Crime|Romance|Thriller'
 'Comedy|Crime|Romance' 'Horror|Mystery|Sci-Fi|Thriller'
 'Biography|Drama|Music' 'Drama|Fantasy|Sport'
 'Adventure|Comedy|Drama|Music' 'Action|Fantasy|Horror|Sci-Fi|Thriller'
 'Adventure|Animation|Comedy|Drama|Family|Fantasy|Romance'
 'Horror|Sci-Fi|Thriller' 'Drama|Fantasy|Mystery|Romance|Thriller'
 'Action|Adventure|Drama|History|Romance|War'
 'Drama|Fantasy|Mystery|Romance' 'Fantasy|Horror|Mystery|Romance'
 'Adventure|Comedy|Family|Romance|Sci-Fi' 'Drama|Horror|Thriller'
 'Action|Comedy|Mystery|Romance' 'Action|Adventure|Comedy|Romance|Sci-Fi'
 'Action|Biography|Drama|History|Thriller|War'
 'Adventure|Comedy|Family|Fantasy|Horror' 'Comedy|Family|Romance|Sci-Fi'
 'Action|Adventure|Thriller|War' 'Comedy|Drama|Romance|Sport'
 'Comedy|Western' 'Action|Comedy|Crime|Drama' 'Drama|Music|Romance|War'
 'Action|Comedy|Drama|Family|Thriller' 'Action|Crime'
 'Adventure|Animation|Drama|Family|History|Musical|Romance'
 'Action|Adventure|Drama|Romance|Sci-Fi'
 'Action|Adventure|Comedy|Family|Romance'
 'Action|Adventure|Comedy|Western' 'Biography|Drama|History|Musical'
 'Adventure|Drama|Horror|Thriller' 'Action|Drama|Sport|Thriller'
 'Drama|Musical|Romance|Thriller' 'Comedy|Drama|Family|Fantasy'
 'Adventure|Comedy|Crime|Family|Musical' 'Drama|Music|Musical|Romance'
 'Drama|Mystery|Romance|War' 'Crime|Drama|Romance'
 'Crime|Horror|Mystery|Thriller'
 'Adventure|Animation|Drama|Family|Fantasy|Musical|Mystery|Romance'
 'Action|Horror|Thriller' 'Drama|History|Horror' 'Drama|Romance|Sport'
 'Comedy|Family|Musical|Romance' 'Romance|Sci-Fi|Thriller'
 'Biography|Comedy|Drama|Romance' 'Mystery|Sci-Fi|Thriller'
 'Drama|Fantasy|Horror' 'Adventure|Comedy|Drama|Fantasy|Musical'
 'Horror|Mystery' 'Action|Adventure|Family|Fantasy|Sci-Fi|Thriller'
 'Adventure|Comedy|Family|Fantasy|Romance|Sport'
 'Adventure|Horror|Mystery' 'Crime|Drama|Romance|Thriller'
 'Comedy|Crime|Drama|Thriller' 'Drama|Fantasy' 'Adventure|Comedy|Drama'
 'Action|Biography|Drama|History|War' 'Adventure|Comedy|Fantasy'
 'Adventure|Comedy|Crime|Drama|Family'
 'Action|Biography|Crime|Drama|Thriller' 'Comedy|Sci-Fi'
 'Drama|Romance|Sci-Fi' 'Action|Adventure|Comedy|Crime|Music|Mystery'
 'Comedy|Drama|Music' 'Action|Crime|Drama|Sci-Fi|Thriller'
 'Horror|Thriller' 'Action|Adventure|Comedy|Drama|War'
 'Drama|Mystery|Sci-Fi' 'Crime|Drama|Music'
 'Adventure|Crime|Drama|Western' 'Comedy|Drama|Thriller'
 'Drama|Romance|War' 'Action|Comedy|Crime|Music|Romance|Thriller'
 'Crime|Romance|Thriller' 'Action|Adventure|Drama|Sci-Fi|Thriller'
 'Action|Drama|Fantasy|Thriller|Western'
 'Action|Drama|Mystery|Thriller|War' 'Biography|Crime|Drama|Thriller'
 'Action|Comedy|Crime|Romance' 'Action|Adventure|Family|Fantasy|Sci-Fi'
 'Adventure|Comedy|Family|Musical' 'Action|Horror'
 'Action|Adventure|Horror|Thriller' 'Comedy|Drama|Music|Romance'
 'Action|Crime|Drama|Romance|Thriller' 'Comedy|Family|Romance|Sport'
 'Drama|Family|Fantasy' 'Drama|Fantasy|Musical|Romance'
 'Adventure|Comedy|Family|Fantasy|Sci-Fi' 'Comedy|Musical'
 'Biography|Drama|History' 'Action|Crime|Drama|Thriller|War'
 'Comedy|Crime|Thriller' 'Drama|Fantasy|Horror|Mystery'
 'Action|Animation|Comedy|Family|Fantasy'
 'Biography|Drama|History|Thriller'
 'Action|Adventure|Crime|Drama|Mystery|Thriller'
 'Animation|Family|Fantasy|Musical' 'Adventure|Drama|Western'
 'Biography|Drama|History|Romance' 'Drama|Horror|Mystery|Thriller'
 'Action|Fantasy|Western' 'Comedy|War' 'Drama|Music'
 'Action|Drama|Family|Sport' 'Action|Biography|Drama|Thriller|War'
 'Comedy|Drama|Sport' 'Adventure|Comedy|Sci-Fi|Western'
 'Fantasy|Horror|Romance' 'Biography|Drama|Romance'
 'Action|Adventure|Drama|Romance|War' 'Adventure|Comedy|Crime|Romance'
 'Comedy|Drama|Family|Fantasy|Romance' 'Horror' 'Comedy|Music'
 'Action|Adventure|Drama|Romance|Thriller' 'Biography|Drama|Music|Musical'
 'Drama|History' 'Comedy|Music|Romance'
 'Action|Adventure|Crime|Fantasy|Mystery|Thriller'
 'Adventure|Drama|Mystery' 'Biography|Crime|Drama|Music'
 'Crime|Drama|Horror|Thriller'
 'Adventure|Animation|Comedy|Drama|Family|Fantasy|Musical'
 'Action|Adventure|Comedy|Music|Thriller'
 'Adventure|Animation|Comedy|Crime|Family'
 'Comedy|Romance|Sci-Fi|Thriller' 'Comedy|Crime|Family|Romance'
 'Crime|Horror|Thriller' 'Action|Horror|Mystery|Sci-Fi|Thriller'
 'Comedy|Fantasy|Sci-Fi' 'Adventure|Animation|Comedy|Fantasy|Romance'
 'Action|Adventure|Family|Thriller'
 'Adventure|Comedy|Drama|Romance|Thriller|War'
 'Adventure|Animation|Comedy|Fantasy|Music|Romance' 'Action|Drama|Fantasy'
 'Action|Adventure|Drama|Fantasy|War' 'Drama|Fantasy|Romance|Sci-Fi'
 'Animation|Comedy|Family|Horror|Sci-Fi' 'Biography|Drama|Romance|Sport'
 'Action|Biography|Drama' 'Adventure|Drama' 'Horror|Mystery|Sci-Fi'
 'Action|Adventure|Drama|Thriller|Western'
 'Adventure|Family|Fantasy|Sci-Fi' 'Adventure|Comedy|History|Romance'
 'Action|Biography|Drama|Sport' 'Drama|Family'
 'Action|Adventure|Crime|Drama|Family|Fantasy|Romance|Thriller'
 'Biography|Comedy|Romance' 'Action|Biography|Drama|History'
 'Biography|Drama|War' 'Adventure|Comedy|Family|Sci-Fi'
 'Biography|Drama|Family|History|Sport'
 'Biography|Comedy|Drama|History|Music' 'Fantasy|Horror'
 'Comedy|Drama|Family|Sport' 'Comedy|Drama|Romance|Sci-Fi'
 'Adventure|Animation|Comedy|Family|War' 'Action|Comedy|Sci-Fi|Thriller'
 'Comedy|Horror' 'Drama|Thriller|War' 'Action|Western'
 'Action|Adventure|Family|Sci-Fi' 'Adventure|Biography|Drama|Thriller'
 'Drama|Romance|War|Western' 'Action|Comedy|Crime|Western'
 'Action|Adventure|Comedy|Drama|Thriller' 'Drama|Music|Romance'
 'Action|Adventure|Crime|Drama|Thriller' 'Adventure|Comedy|Family|Sport'
 'Comedy|Drama|Fantasy' 'Comedy|Family|Sport'
 'Action|Adventure|Drama|Family' 'Action|Comedy|War' 'Drama|Family|Sport'
 'Action|Thriller|Western' 'Action|Drama|Fantasy|Horror|Thriller'
 'Animation|Comedy|Family|Fantasy|Musical'
 'Action|Adventure|Comedy|Fantasy|Romance'
 'Action|Crime|Drama|Mystery|Sci-Fi|Thriller'
 'Adventure|Comedy|Crime|Drama' 'Drama|Mystery'
 'Comedy|Fantasy|Horror|Thriller' 'Crime|Drama|Mystery|Sci-Fi|Thriller'
 'Comedy|Crime|Musical' 'Comedy|Drama|Family|Music|Romance'
 'Comedy|Horror|Romance' 'Comedy|Family|Fantasy|Sport'
 'Animation|Comedy|Family|Mystery|Sci-Fi'
 'Adventure|Comedy|Drama|Family|Sport'
 'Animation|Drama|Family|Fantasy|Musical|Romance'
 'Comedy|Horror|Musical|Sci-Fi' 'Crime|Drama|Sport'
 'Action|Adventure|Animation|Drama|Mystery|Sci-Fi|Thriller'
 'Action|Adventure|Crime|Drama|Romance' 'Action|Comedy|Horror'
 'Adventure|Horror|Thriller' 'Adventure|Fantasy|Mystery'
 'Action|Drama|Romance|Sport' 'Biography|Crime|Drama|History|Western'
 'Action|Biography|Crime|Drama' 'Adventure|Animation|Fantasy'
 'Adventure|Animation|Comedy|Fantasy' 'Biography|Drama|Music|Romance'
 'Adventure|Drama|Mystery|Sci-Fi|Thriller'
 'Biography|Comedy|Crime|Drama|Romance|Thriller'
 'Biography|Crime|Drama|History|Music'
 'Adventure|Animation|Comedy|Drama|Family|Musical'
 'Biography|Comedy|Drama|Music|Romance' 'Adventure|Animation|Sci-Fi'
 'Drama|Romance|Thriller' 'Action|Fantasy|Horror|Thriller'
 'Adventure|Biography' 'Action|Comedy|Family' 'Action|Horror|Romance'
 'Adventure|Drama|History|Romance|Thriller|War'
 'Crime|Drama|Sci-Fi|Thriller' 'Action|Comedy|Crime|Music'
 'Comedy|Drama|Family|Romance'
 'Action|Drama|Fantasy|Mystery|Sci-Fi|Thriller'
 'Adventure|Family|Fantasy|Horror|Mystery'
 'Action|Crime|Drama|History|Western' 'Comedy|Crime|Drama'
 'Comedy|Family|Fantasy|Music|Romance' 'Adventure|Comedy|Crime|Music'
 'Action|Adventure|Comedy|Sci-Fi|Thriller' 'Action|Crime|Drama|Western'
 'Action|Adventure|Comedy|Family|Romance|Sci-Fi'
 'Action|Fantasy|Romance|Sci-Fi' 'Comedy|Crime|Mystery|Romance'
 'Adventure|Family' 'Action|Drama|Music|Romance'
 'Adventure|Comedy|Family|Fantasy|Horror|Mystery'
 'Adventure|Fantasy|Mystery|Thriller'
 'Action|Biography|Drama|History|Romance|Western' 'Fantasy|Horror|Mystery'
 'Biography|Drama|Family'
 'Action|Adventure|Comedy|Crime|Family|Romance|Thriller'
 'Comedy|Fantasy|Horror|Romance' 'Comedy|Family|Music'
 'Action|Comedy|Music' 'Adventure|Comedy|Crime'
 'Biography|Comedy|Drama|Sport' 'Fantasy|Horror|Thriller'
 'Comedy|Drama|Romance|Thriller' 'Adventure|Comedy|Family|Romance'
 'Adventure|Family|Fantasy|Musical'
 'Biography|Crime|Drama|History|Thriller'
 'Action|Animation|Comedy|Family|Fantasy|Sci-Fi' 'Crime|Drama|History'
 'Biography|Drama|Thriller|War' 'Drama|Music|Mystery|Romance|Thriller'
 'Action|Adventure|Fantasy|Horror' 'Crime|Drama|Mystery|Romance'
 'Action|Adventure|History|Romance' 'Action|Drama|Western'
 'Adventure|Comedy|Family|Fantasy|Music|Sci-Fi'
 'Adventure|Family|Fantasy|Music|Musical'
 'Action|Adventure|Animation|Comedy|Fantasy'
 'Adventure|Comedy|Horror|Sci-Fi' 'Horror|Sci-Fi'
 'Biography|Comedy|Drama|Family|Sport'
 'Action|Crime|Drama|Thriller|Western' 'Action|Drama|History'
 'Drama|Fantasy|Romance|Thriller' 'Thriller' 'Comedy|Mystery'
 'Comedy|Drama|Musical|Romance|War' 'Drama|History|Music|Romance|War'
 'Comedy|History' 'Adventure|Animation|Family|Sport'
 'Animation|Comedy|Fantasy|Musical' 'Game-Show|Reality-TV|Romance'
 'Action|Comedy|Documentary' 'Adventure|Comedy|Drama|Family|Romance'
 'Adventure|Comedy|Drama|Family|Mystery' 'Drama|Family|Music|Romance'
 'Fantasy|Romance' 'Adventure|Animation|Family|Musical'
 'Animation|Comedy|Drama|Family|Musical' 'Biography|Crime|Drama|History'
 'Adventure|Comedy|Fantasy|Music|Sci-Fi'
 'Comedy|Drama|Musical|Romance|Western' 'Action|Adventure|Drama|Mystery'
 'Comedy|Crime|Family|Mystery|Romance|Thriller'
 'Action|Adventure|Drama|Romance|Western'
 'Adventure|Crime|Mystery|Sci-Fi|Thriller' 'Crime|Drama|Western'
 'Adventure|Comedy|Drama|Fantasy' 'Adventure|Biography|Drama'
 'Adventure|Drama|Horror|Mystery|Thriller' 'Crime|Fantasy|Horror'
 'Animation|Family|Fantasy|Mystery' 'Action|Comedy|Crime|Fantasy'
 'Comedy|Family|Music|Musical' 'Crime|Documentary|News'
 'Drama|Mystery|Romance|Thriller|War' 'Action|Crime|Drama|Sport'
 'Comedy|Drama|Music|War' 'Comedy|Musical|Romance'
 'Comedy|Drama|Music|Musical|Romance' 'Comedy|Crime|Drama|Mystery|Romance'
 'Biography|Comedy|Drama|History|Music|Musical'
 'Animation|Drama|Mystery|Sci-Fi|Thriller'
 'Adventure|Comedy|Drama|Romance'
 'Comedy|Drama|Mystery|Romance|Thriller|War' 'Biography|Comedy|Musical'
 'Action|Adventure|Animation|Family|Sci-Fi|Thriller'
 'Crime|Drama|Mystery|Romance|Thriller' 'Comedy|Family|Fantasy|Sci-Fi'
 'Action|Comedy|Crime|Fantasy|Horror|Mystery|Sci-Fi|Thriller'
 'Romance|Short' 'Animation' 'Drama|Horror'
 'Comedy|Drama|Reality-TV|Romance' 'Adventure|Comedy|Romance'
 'Family|Fantasy|Music' 'Crime|Drama|Music|Thriller'
 'Action|Drama|Fantasy|Mystery|Thriller' 'Biography|Drama|History|Music'
 'Biography|Drama|Family|Sport' 'Comedy|Drama|War'
 'Biography|Drama|Romance|War' 'Action|Horror|Romance|Sci-Fi|Thriller'
 'Music' 'Action|Drama|History|Romance|War|Western'
 'Action|Animation|Sci-Fi|Thriller' 'Action|Animation|Comedy|Crime|Family'
 'Drama|Family|Music|Musical' 'Drama|Family|Musical|Romance'
 'Comedy|Drama|Family|Fantasy|Sci-Fi' 'Comedy|Crime|Drama|Music|Romance'
 'Adventure|Comedy|Family|Fantasy|Musical' 'Adventure|Crime|Drama|Romance'
 'Comedy|Mystery|Sci-Fi|Thriller' 'Sci-Fi' 'Drama|Fantasy|War'
 'Action|Comedy|Crime|Family' 'Action|Comedy|Mystery'
 'Comedy|Crime|Mystery' 'Action|Crime|Sci-Fi' 'Comedy|Horror|Sci-Fi'
 'Action|Comedy|Drama|Thriller' 'Drama|Family|Romance'
 'Adventure|Comedy|Family|Music|Romance' 'Comedy|Horror|Thriller'
 'Comedy|Family|Music|Romance' 'Adventure|Fantasy|Horror|Mystery|Thriller'
 'Crime|Drama|Musical|Romance' 'Family|Music|Romance'
 'Drama|Fantasy|Mystery|Sci-Fi' 'Biography|Drama|History|Thriller|War'
 'Adventure|Crime|Drama|Mystery|Western' 'Drama|Fantasy|Horror|Romance'
 'Comedy|Crime|Drama|Thriller|War'
 'Action|Adventure|Drama|History|Thriller|War' 'Action|Comedy|Drama|War'
 'Comedy|Drama|Fantasy|Music|Romance' 'Biography|Drama|Fantasy|History'
 'Biography' 'Drama|Family|Music' 'Adventure|Mystery|Thriller'
 'Comedy|Mystery|Romance' 'Biography|Crime|Drama|War'
 'Crime|Drama|Music|Mystery|Thriller' 'Biography|Comedy|Drama|War'
 'Comedy|Crime|Family|Sci-Fi' 'Adventure|Family|Sci-Fi'
 'Adventure|Comedy|Romance|Sci-Fi' 'Action|Adventure|Comedy|Family'
 'Biography|Comedy|Crime|Drama|Romance' 'Crime|Drama|Musical'
 'Animation|Comedy|Crime|Drama|Family'
 'Action|Adventure|Comedy|Fantasy|Mystery'
 'Action|Adventure|Drama|Thriller|War' 'Crime|Drama|Music|Romance'
 'Adventure|Animation|Comedy|Crime' 'Adventure|Comedy|Fantasy|Sci-Fi'
 'Comedy|Drama|Family|Fantasy|Musical'
 'Action|Adventure|Biography|Drama|History' 'Comedy|Crime|Family'
 'Adventure|Drama|Thriller|War' 'Comedy|Drama|Horror|Sci-Fi'
 'Adventure|Crime|Thriller' 'Mystery|Romance|Sci-Fi|Thriller'
 'Fantasy|Mystery|Thriller' 'Family|Musical'
 'Adventure|Crime|Drama|Mystery|Thriller' 'Drama|Fantasy|Music|Romance'
 'Adventure|Drama|History|War' 'Family|Sci-Fi'
 'Drama|History|Romance|Western' 'Adventure|Comedy|Music|Sci-Fi'
 'Drama|Family|Musical' 'Action|Comedy|Drama|Music'
 'Fantasy|Horror|Sci-Fi' 'Western' 'Comedy|Romance|Thriller'
 'Biography|Crime|Drama|Romance' 'Adventure|Comedy|Drama|Romance|Sci-Fi'
 'Drama|Music|Mystery|Romance' 'Action|Crime|Drama'
 'Adventure|Biography|Drama|War' 'Action|Comedy|Drama'
 'Adventure|Animation' 'Comedy|Drama|Horror|Romance'
 'Action|Comedy|Drama|Western' 'Comedy|Crime|Drama|Mystery'
 'Adventure|Animation|Fantasy|Horror|Sci-Fi'
 'Action|Drama|Romance|Thriller' 'Biography|Comedy|Drama|Family|Romance'
 'Action|Biography|Drama|History|Romance|War'
 'Action|Animation|Fantasy|Horror|Mystery|Sci-Fi|Thriller'
 'Action|Adventure|Animation|Drama|Fantasy|Sci-Fi' 'Horror|Musical|Sci-Fi'
 'Biography|Drama|Family|Musical|Romance'
 'Comedy|Crime|Drama|Romance|Thriller' 'Adventure|Drama|Fantasy|Mystery'
 'Animation|Comedy|Drama|Romance' 'Comedy|Crime|Musical|Romance'
 'Comedy|Crime|Musical|Mystery' 'Action|Animation|Sci-Fi'
 'Drama|War|Western' 'Drama|Romance|Sci-Fi|Thriller'
 'Animation|Biography|Drama|War' 'Adventure|Fantasy|Thriller'
 'Documentary|Sport' 'Crime|Horror' 'Adventure|Biography|Drama|History'
 'Action|Crime|Horror|Sci-Fi|Thriller' 'Comedy|Fantasy|Horror|Mystery'
 'Action|Adventure|Animation|Comedy|Drama|Family|Fantasy|Thriller'
 'Action|Adventure|Drama|Fantasy|Sci-Fi' 'Drama|Mystery|War'
 'Action|Comedy|Crime|Drama|Romance|Thriller' 'Comedy|Drama|Musical'
 'Mystery|Romance|Thriller' 'Adventure|Comedy|Drama|Family'
 'Action|Adventure|Drama|Western' 'Musical|Romance'
 'Documentary|Drama|War' 'Biography|Crime|Drama|Western'
 'Comedy|Family|Fantasy|Musical' 'Crime|Drama|Musical|Romance|Thriller'
 'Fantasy|Horror|Romance|Thriller' 'Adventure|Documentary|Short'
 'Adventure|Crime|Drama|Thriller' 'Thriller|War' 'Action|Sport' 'Musical'
 'Mystery|Western' 'Comedy|Drama|History|Romance'
 'Comedy|Horror|Sci-Fi|Thriller' 'Drama|Horror|Mystery|Sci-Fi|Thriller'
 'Comedy|Documentary' 'Adventure|Drama|Family|Fantasy|Sci-Fi'
 'Adventure|Drama|Family|Romance|Western' 'Adventure|Horror'
 'Comedy|Music|Sci-Fi' 'Biography|Crime|Drama|Romance|Thriller'
 'Comedy|Crime|Drama|Mystery|Thriller'
 'Biography|Crime|Drama|Mystery|Thriller' 'Crime|Horror|Music|Thriller'
 'Crime|Documentary|War' 'Crime|Thriller|War'
 'Comedy|Crime|Horror|Thriller' 'Animation|Comedy' 'Family'
 'Comedy|Drama|Romance|War' 'Biography|Drama|Romance|Western'
 'Drama|Musical' 'Adventure|Comedy|Western'
 'Action|Drama|History|Thriller|War' 'Fantasy|Thriller'
 'Drama|Horror|Mystery' 'Adventure|Drama|History|Thriller|War'
 'Comedy|Documentary|Drama|Fantasy|Mystery|Sci-Fi'
 'Crime|Drama|Fantasy|Romance' 'Action|Crime|Horror|Thriller'
 'Comedy|Horror|Mystery' 'Drama|Family|History|Musical'
 'Adventure|Biography|Drama|Romance' 'Adventure|War|Western'
 'Biography|Comedy|Musical|Romance|Western'
 'Adventure|Comedy|Musical|Romance' 'Comedy|Drama|Romance|Western'
 'Action|Adventure|Comedy|Musical' 'Comedy|Drama|Fantasy|Horror'
 'Action|Biography|Crime|Drama|Family|Fantasy'
 'Action|Animation|Crime|Sci-Fi|Thriller' 'Action|Comedy|Horror|Thriller'
 'Crime|Documentary|Drama' 'Biography|Comedy|Documentary'
 'Comedy|Thriller' 'Comedy|Documentary|Music'
 'Action|Adventure|Romance|Western' 'Crime|Drama|History|Romance'
 'Family|Fantasy|Musical' 'Comedy|Drama|Horror' 'Drama|Family|Western'
 'Comedy|Drama|Horror|Sci-Fi|Thriller' 'Drama|Horror|Romance'
 'Adventure|Crime|Drama' 'Action|Adventure|Crime|Drama'
 'Adventure|Family|Sport' 'Romance'
 'Action|Adventure|Animation|Comedy|Sci-Fi' 'Drama|Fantasy|Romance|War'
 'Documentary|History|Sport' 'Action|Drama|Horror|Thriller'
 'Comedy|Crime|Drama|Sci-Fi' 'Comedy|Family|Musical|Romance|Short'
 'Comedy|Documentary|War' 'Comedy|Drama|Mystery|Romance|Thriller'
 'Action|Comedy|Horror|Sci-Fi' 'Adventure|Drama|Romance|Western'
 'Animation|Comedy|Drama' 'Adventure|Documentary|Drama|Sport'
 'Crime|Documentary' 'Animation|Biography|Documentary|Drama|History|War'
 'Documentary|War' 'Documentary|History' 'Biography|Documentary|History'
 'Action|Adventure|Comedy|Drama|Music|Sci-Fi'
 'Biography|Comedy|Drama|Music' 'Animation|Comedy|Family|Romance'
 'Horror|Romance|Sci-Fi' 'Action|Comedy|Fantasy|Horror'
 'Crime|Drama|Film-Noir|Mystery|Thriller' 'Comedy|Fantasy|Musical|Sci-Fi'
 'Action|Adventure|History|Western' 'Documentary|Drama|History|News'
 'Biography|Crime|Documentary|History|Thriller' 'Crime|Drama|Film-Noir'
 'Film-Noir|Mystery|Romance|Thriller' 'Comedy|Crime|Sci-Fi|Thriller'
 'Adventure|Comedy|Horror' 'Action|Crime|Drama|Mystery'
 'Horror|Romance|Thriller' 'Drama|Film-Noir|Mystery|Thriller'
 'Drama|Film-Noir' 'Crime|Film-Noir|Thriller'
 'Action|Adventure|Romance|War' 'Action|Horror|Mystery|Thriller'
 'Adventure|Comedy|Sport' 'Comedy|Horror|Musical'
 'Adventure|Comedy|History' 'Action|Drama|Romance|War'
 'Biography|Documentary|Music' 'Comedy|Fantasy|Mystery'
 'Biography|Crime|Documentary|History'
 'Adventure|Biography|Documentary|Drama'
 'Action|Adventure|Comedy|Fantasy|Sci-Fi' 'Drama|Musical|Sci-Fi'
 'Documentary|News' 'Comedy|Fantasy|Thriller' 'Animation|Drama|Family'
 'Drama|Fantasy|Sci-Fi' 'Action|Comedy|Drama|Sci-Fi'
 'Action|Adventure|Drama|War' 'Horror|Sci-Fi|Short|Thriller'
 'Action|Adventure|Animation|Comedy|Fantasy|Sci-Fi' 'Thriller|Western'
 'Documentary|Drama|Sport' 'Documentary|History|Music'
 'Biography|Documentary|Drama' 'Adventure|Family|Romance'
 'Adventure|Biography|Drama|Horror|Thriller' 'Documentary|Family|Music'
 'Biography|Documentary|Sport' 'History' 'Action|Romance|Sport'
 'Horror|Musical' 'Comedy|Mystery|Thriller'
 'Action|Biography|Documentary|Sport' 'Comedy|Fantasy|Horror|Musical'
 'Drama|Fantasy|Sci-Fi|Thriller' 'Biography|Documentary' 'Animation|Drama'
 'Action|Fantasy|Horror|Mystery|Thriller' 'Action|Comedy|Sci-Fi|Sport'
 'Comedy|Crime|Drama|Horror|Mystery|Thriller'
 'Action|Adventure|Mystery|Romance|Thriller'
 'Animation|Comedy|Drama|Fantasy|Sci-Fi' 'Action|Drama|Fantasy|Sci-Fi'
 'Comedy|Short' 'Adventure|Drama|Fantasy|Thriller|Western'
 'Adventure|Horror|Sci-Fi' 'Comedy|Drama|History|Musical|Romance'
 'Comedy|Horror|Mystery|Thriller' 'Drama|Music|Mystery|Romance|Sci-Fi'
 'Adventure|Documentary' 'Documentary|Family'
 'Comedy|Crime|Drama|Horror|Thriller' 'Comedy|Documentary|Drama'
 'Crime|Drama|Horror' 'Comedy|Crime|Horror']

actor_1_name:
	- Total de datos únicos: 2098
	- Valores: ['CCH Pounder' 'Johnny Depp' 'Christoph Waltz' ... 'Natalie Zea'
 'Eva Boehnke' 'John August']

movie_title:
	- Total de datos únicos: 4917
	- Valores: ['Avatar\xa0' "Pirates of the Caribbean: At World's End\xa0" 'Spectre\xa0'
 ... 'A Plague So Pleasant\xa0' 'Shanghai Calling\xa0'
 'My Date with Drew\xa0']

num_voted_users:
	- Total de datos únicos: 4826
	- Valores: [886204 471220 275868 ...  73839   1255   4285]

cast_total_facebook_likes:
	- Total de datos únicos: 3978
	- Valores: [ 4834 48350 11700 ...    93   690  2386]

actor_3_name:
	- Total de datos únicos: 3522
	- Valores: ['Wes Studi' 'Jack Davenport' 'Stephanie Sigman' ... 'David Chandler'
 'Eliza Coupe' 'Jon Gunn']

facenumber_in_poster:
	- Total de datos únicos: 20
	- Valores: [ 0.  1.  4.  3.  2.  6.  7.  5.  8. nan 10. 15.  9. 11. 12. 31. 14. 19.
 13. 43.]

plot_keywords:
	- Total de datos únicos: 4761
	- Valores: ['avatar|future|marine|native|paraplegic'
 'goddess|marriage ceremony|marriage proposal|pirate|singapore'
 'bomb|espionage|sequel|spy|terrorist' ...
 'fraud|postal worker|prison|theft|trial'
 'cult|fbi|hideout|prison escape|serial killer'
 'actress name in title|crush|date|four word title|video camera']

movie_imdb_link:
	- Total de datos únicos: 4919
	- Valores: ['http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1'
 'http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1'
 'http://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1' ...
 'http://www.imdb.com/title/tt2107644/?ref_=fn_tt_tt_1'
 'http://www.imdb.com/title/tt2070597/?ref_=fn_tt_tt_1'
 'http://www.imdb.com/title/tt0378407/?ref_=fn_tt_tt_1']

num_user_for_reviews:
	- Total de datos únicos: 955
	- Valores: [3.054e+03 1.238e+03 9.940e+02 2.701e+03       nan 7.380e+02 1.902e+03
 3.870e+02 1.117e+03 9.730e+02 3.018e+03 2.367e+03 1.243e+03 1.832e+03
 7.110e+02 2.536e+03 4.380e+02 1.722e+03 4.840e+02 3.410e+02 8.020e+02
 1.225e+03 5.460e+02 9.510e+02 6.660e+02 2.618e+03 2.528e+03 1.022e+03
 7.510e+02 1.290e+03 1.498e+03 1.303e+03 1.187e+03 7.360e+02 1.912e+03
 2.650e+02 1.439e+03 9.180e+02 5.110e+02 1.067e+03 6.650e+02 2.830e+02
 5.500e+02 7.330e+02 9.740e+02 6.570e+02 9.950e+02 7.520e+02 1.171e+03
 2.050e+02 7.530e+02 4.530e+02 1.106e+03 8.990e+02 2.054e+03 3.450e+02
 4.280e+02 4.320e+02 1.043e+03 2.210e+02 1.055e+03 2.490e+02 7.200e+02
 2.390e+02 1.463e+03 6.220e+02 4.667e+03 7.040e+02 1.870e+02 6.780e+02
 6.480e+02 5.010e+02 9.710e+02 2.570e+02 7.410e+02 3.090e+02 5.340e+02
 7.730e+02 3.980e+02 7.230e+02 7.100e+02 6.340e+02 6.200e+02 1.500e+01
 3.240e+02 7.420e+02 1.730e+02 4.970e+02 4.330e+02 4.440e+02 5.200e+02
 4.920e+02 1.676e+03 1.097e+03 2.725e+03 2.803e+03 1.300e+01 1.367e+03
 9.880e+02 8.220e+02 6.980e+02 3.830e+02 2.380e+02 6.290e+02 1.310e+02
 3.260e+02 7.810e+02 8.670e+02 2.270e+02 1.999e+03 1.782e+03 1.390e+03
 1.108e+03 1.896e+03 5.900e+02 1.413e+03 1.361e+03 6.260e+02 2.685e+03
 1.190e+02 2.090e+02 6.410e+02 2.121e+03 9.040e+02 2.789e+03 5.320e+02
 1.588e+03 4.350e+02 1.780e+02 9.000e+01 2.530e+02 4.790e+02 4.400e+02
 2.060e+02 1.382e+03 8.710e+02 4.340e+02 1.120e+02 1.220e+02 1.860e+02
 1.300e+02 1.694e+03 1.540e+02 1.185e+03 1.211e+03 6.060e+02 5.050e+02
 1.450e+02 5.120e+02 1.740e+02 2.580e+02 9.280e+02 1.559e+03 2.012e+03
 3.430e+02 2.730e+02 3.880e+02 1.229e+03 2.870e+02 1.445e+03 2.880e+02
 4.630e+02 7.880e+02 6.790e+02 6.830e+02 6.840e+02 3.290e+02 7.900e+01
 6.430e+02 7.400e+01 1.060e+02 1.188e+03 3.370e+02 1.180e+02 8.200e+02
 3.600e+02 5.490e+02 7.060e+02 2.140e+02 2.741e+03 1.370e+02 5.140e+02
 1.240e+03 4.470e+02 1.504e+03 4.500e+02 7.440e+02 2.410e+02 2.000e+00
 1.260e+02 1.571e+03 2.100e+02 2.113e+03 5.910e+02 1.966e+03 9.900e+01
 3.660e+02 4.120e+02 6.370e+02 3.910e+02 5.040e+02 1.018e+03 4.820e+02
 1.159e+03 1.426e+03 7.790e+02 4.360e+02 7.550e+02 6.810e+02 2.970e+02
 5.540e+02 2.326e+03 6.900e+01 8.140e+02 6.300e+02 4.140e+02 1.960e+02
 3.480e+02 8.920e+02 3.286e+03 3.516e+03 5.930e+02 5.330e+02 3.597e+03
 1.950e+02 3.600e+01 4.540e+02 1.340e+02 4.910e+02 1.885e+03 2.770e+02
 1.150e+02 6.950e+02 4.990e+02 3.280e+02 1.144e+03 6.270e+02 7.980e+02
 7.990e+02 1.210e+02 4.430e+02 9.700e+01 5.230e+02 1.530e+02 8.800e+01
 1.440e+02 4.260e+02 5.900e+01 2.480e+02 7.820e+02 5.060e+03 1.910e+02
 3.860e+02 4.560e+02 7.890e+02 9.420e+02 1.790e+02 1.023e+03 8.900e+01
 4.980e+02 2.368e+03 1.331e+03 8.580e+02 2.301e+03 1.368e+03 9.830e+02
 5.850e+02 4.580e+02 3.510e+02 2.850e+02 1.520e+02 3.160e+02 1.193e+03
 2.300e+02 4.740e+02 6.920e+02 1.690e+03 1.130e+02 3.740e+02 5.240e+02
 1.138e+03 5.390e+02 1.049e+03 8.280e+02 1.600e+02 6.600e+02 2.690e+02
 2.170e+02 2.240e+02 1.630e+02 1.610e+02 1.640e+02 2.610e+02 1.550e+02
 2.630e+02 8.900e+02 1.166e+03 4.070e+02 1.103e+03 1.770e+02 2.930e+02
 1.390e+02 3.220e+02 8.660e+02 3.189e+03 2.417e+03 8.240e+02 4.620e+02
 1.236e+03 6.460e+02 1.250e+02 5.030e+02 2.670e+02 8.150e+02 1.690e+02
 4.190e+02 2.890e+02 5.150e+02 3.940e+02 1.560e+02 1.320e+02 2.220e+02
 5.770e+02 6.320e+02 3.460e+02 6.210e+02 1.000e+00 5.300e+01 3.800e+02
 3.420e+02 2.153e+03 6.110e+02 6.280e+02 4.110e+02 2.040e+02 1.840e+02
 3.610e+02 9.490e+02 3.620e+02 1.810e+02 1.140e+02 2.780e+02 5.290e+02
 2.400e+01 4.550e+02 3.950e+02 1.051e+03 5.480e+02 8.060e+02 3.930e+02
 8.450e+02 2.540e+02 1.680e+02 3.990e+02 2.700e+02 2.760e+02 7.700e+01
 1.940e+02 4.830e+02 4.150e+02 1.000e+02 7.130e+02 6.620e+02 2.550e+02
 3.780e+02 7.800e+01 7.030e+02 3.390e+02 1.460e+02 2.080e+02 5.220e+02
 4.660e+02 1.710e+02 5.210e+02 5.880e+02 6.850e+02 2.110e+02 6.300e+01
 4.880e+02 5.890e+02 6.040e+02 1.959e+03 4.020e+02 6.100e+02 2.500e+02
 3.670e+02 1.020e+02 4.930e+02 5.600e+01 9.640e+02 1.240e+02 5.350e+02
 1.160e+02 1.009e+03 2.030e+02 3.180e+02 6.740e+02 5.600e+02 6.400e+01
 8.050e+02 1.230e+02 4.240e+02 1.620e+02 7.010e+02 2.160e+02 1.270e+02
 3.700e+02 3.630e+02 4.000e+01 7.100e+01 6.150e+02 5.430e+02 2.440e+02
 6.190e+02 2.370e+02 4.050e+02 3.000e+00 1.820e+02 2.520e+02 3.150e+02
 1.010e+02 6.600e+01 1.308e+03 2.320e+02 4.570e+02 3.760e+02 1.170e+02
 4.200e+01 3.010e+02 3.080e+02 6.560e+02 1.330e+02 8.500e+01 7.640e+02
 2.640e+02 1.970e+02 7.860e+02 2.840e+02 8.560e+02 5.520e+02 3.850e+02
 1.206e+03 1.401e+03 4.010e+02 2.310e+02 3.110e+02 5.960e+02 9.400e+01
 5.700e+01 3.790e+02 6.700e+01 3.230e+02 1.200e+02 2.290e+02 1.800e+02
 3.770e+02 3.730e+02 1.344e+03 2.750e+02 2.740e+02 1.380e+02 4.370e+02
 1.065e+03 7.630e+02 2.003e+03 9.800e+01 2.335e+03 5.840e+02 7.370e+02
 1.580e+02 1.527e+03 3.400e+02 1.100e+02 1.248e+03 1.040e+03 1.283e+03
 8.000e+02 8.160e+02 4.030e+02 2.450e+02 4.230e+02 2.810e+02 1.410e+02
 1.890e+02 2.710e+02 3.560e+02 1.570e+02 5.400e+01 2.070e+02 6.180e+02
 3.130e+02 2.790e+02 2.020e+02 5.680e+02 4.450e+02 2.960e+02 6.200e+01
 1.720e+02 1.430e+02 8.700e+01 2.277e+03 4.670e+02 3.646e+03 5.560e+02
 5.640e+02 3.100e+02 1.500e+03 2.130e+02 2.968e+03 1.750e+02 2.800e+01
 2.900e+02 7.950e+02 8.950e+02 8.600e+01 6.020e+02 2.460e+02 8.200e+01
 2.073e+03 3.200e+02 1.377e+03 2.510e+02 1.127e+03 8.490e+02 4.160e+02
 8.770e+02 3.530e+02 2.350e+02 1.280e+02 3.020e+02 4.520e+02 8.360e+02
 4.480e+02 2.230e+02 1.264e+03 6.900e+02 8.300e+01 5.970e+02 1.760e+02
 3.470e+02 4.100e+01 2.560e+02 3.380e+02 4.710e+02 5.690e+02 2.590e+02
 7.300e+01 7.200e+01 8.420e+02 1.500e+02 1.650e+02 2.330e+02 9.100e+01
 8.570e+02 2.430e+02 6.440e+02 2.470e+02 2.820e+02 1.090e+02 1.880e+02
 3.680e+02 9.000e+00 1.830e+02 1.058e+03 9.160e+02 4.100e+02 4.310e+02
 1.980e+02 1.470e+02 3.310e+02 2.200e+02 2.250e+02 1.398e+03 5.070e+02
 3.720e+02 1.030e+02 3.320e+02 5.270e+02 9.200e+01 2.420e+02 6.000e+02
 9.500e+01 2.105e+03 3.580e+02 9.350e+02 1.490e+02 1.080e+02 7.760e+02
 5.720e+02 3.210e+02 2.047e+03 8.400e+01 3.900e+01 1.420e+02 5.590e+02
 5.990e+02 5.450e+02 4.950e+02 4.180e+02 2.319e+03 1.900e+02 4.290e+02
 6.670e+02 6.400e+02 6.120e+02 1.448e+03 4.810e+02 9.190e+02 9.450e+02
 6.230e+02 4.700e+02 3.750e+02 3.570e+02 4.060e+02 1.990e+02 7.910e+02
 2.920e+02 3.120e+02 3.840e+02 2.042e+03 1.920e+02 3.040e+02 6.250e+02
 2.190e+02 3.060e+02 5.060e+02 1.740e+03 1.480e+02 6.720e+02 1.400e+02
 2.260e+02 3.300e+01 6.500e+01 3.920e+02 2.860e+02 2.600e+02 2.200e+01
 4.700e+01 4.200e+02 3.900e+02 7.000e+01 9.300e+01 2.940e+02 4.770e+02
 3.970e+02 8.500e+02 3.200e+01 1.850e+02 5.370e+02 3.070e+02 5.500e+01
 2.360e+02 6.510e+02 1.070e+02 2.900e+01 1.600e+01 4.610e+02 2.120e+02
 2.280e+02 6.330e+02 6.090e+02 5.820e+02 3.500e+02 1.004e+03 3.170e+02
 5.410e+02 6.100e+01 1.053e+03 3.550e+02 3.050e+02 8.000e+01 8.100e+01
 5.310e+02 4.090e+02 3.490e+02 1.350e+02 5.020e+02 3.030e+02 7.500e+01
 2.680e+02 4.600e+02 3.960e+02 8.410e+02 8.350e+02 1.800e+01 5.000e+00
 6.580e+02 3.360e+02 3.270e+02 1.360e+03 1.660e+02 6.960e+02 7.340e+02
 6.540e+02 2.000e+02 5.800e+01 3.820e+02 1.732e+03 4.220e+02 2.150e+02
 4.420e+02 3.440e+02 5.440e+02 1.100e+03 4.400e+01 3.400e+01 3.710e+02
 9.980e+02 2.910e+02 3.340e+02 5.180e+02 5.870e+02 4.300e+01 1.670e+02
 7.540e+02 1.050e+02 6.000e+01 1.314e+03 1.110e+02 2.500e+01 8.010e+02
 2.600e+01 2.340e+02 1.594e+03 5.000e+01 6.380e+02 9.110e+02 4.500e+01
 2.660e+02 4.750e+02 1.437e+03 1.535e+03 2.100e+01 2.180e+02 7.840e+02
 8.170e+02 1.360e+02 3.000e+01 7.240e+02 6.310e+02 6.680e+02 6.770e+02
 2.620e+02 4.000e+02 1.100e+01 7.610e+02 5.760e+02 9.030e+02 1.510e+02
 6.800e+01 3.350e+02 3.250e+02 2.980e+02 9.750e+02 2.800e+02 3.100e+01
 1.290e+02 1.590e+02 7.600e+01 9.150e+02 3.190e+02 1.033e+03 6.470e+02
 4.410e+02 4.850e+02 6.690e+02 5.710e+02 1.125e+03 3.640e+02 4.900e+01
 5.100e+02 1.017e+03 1.080e+03 1.262e+03 1.111e+03 5.090e+02 3.000e+02
 5.810e+02 6.910e+02 9.860e+02 1.040e+02 2.010e+02 8.300e+02 3.540e+02
 6.710e+02 3.800e+01 1.900e+01 3.650e+02 2.700e+01 1.700e+01 2.300e+01
 7.000e+00 1.400e+01 9.600e+01 5.100e+01 7.220e+02 1.700e+02 1.057e+03
 9.620e+02 7.140e+02 1.168e+03 6.240e+02 5.200e+01 4.940e+02 6.730e+02
 4.720e+02 8.620e+02 2.814e+03 1.273e+03 8.850e+02 2.192e+03 1.518e+03
 4.390e+02 9.890e+02 8.510e+02 1.107e+03 3.500e+01 4.144e+03 1.200e+01
 8.880e+02 3.700e+01 5.130e+02 2.000e+01 4.800e+01 8.000e+00 4.040e+02
 3.300e+02 5.530e+02 9.000e+02 7.480e+02 5.800e+02 3.810e+02 5.170e+02
 7.710e+02 8.070e+02 4.600e+01 6.130e+02 3.890e+02 9.080e+02 7.000e+02
 1.514e+03 7.600e+02 5.920e+02 6.000e+00 3.590e+02 2.990e+02 5.700e+02
 1.137e+03 8.690e+02 4.780e+02 1.320e+03 8.090e+02 5.360e+02 9.020e+02
 4.000e+00 1.198e+03 4.250e+02 1.101e+03 1.930e+02 5.550e+02 1.109e+03
 1.076e+03 1.191e+03 8.550e+02 6.450e+02 4.690e+02 5.830e+02 1.083e+03
 6.870e+02 3.520e+02 1.641e+03 2.715e+03 6.360e+02 6.080e+02 4.210e+02
 4.680e+02 1.026e+03 2.400e+02 6.140e+02 7.180e+02 7.350e+02 8.760e+02
 3.690e+02 1.028e+03 1.768e+03 4.130e+02 7.470e+02 2.254e+03 3.140e+02
 6.160e+02 1.140e+03 6.500e+02 1.066e+03 8.400e+02 5.780e+02 6.050e+02
 1.000e+01 7.260e+02 1.470e+03 7.490e+02 1.736e+03 6.820e+02 9.850e+02
 1.015e+03 9.440e+02 5.400e+02 1.420e+03 1.624e+03 5.510e+02 1.110e+03
 8.260e+02 2.195e+03 8.890e+02 1.441e+03 1.061e+03 4.300e+02 8.810e+02
 2.238e+03 5.260e+02 9.220e+02 4.080e+02 1.416e+03 1.182e+03 5.470e+02
 2.720e+02 6.640e+02 8.640e+02 2.067e+03 7.560e+02 5.610e+02 8.590e+02
 5.650e+02 1.516e+03 1.916e+03 2.110e+03 1.848e+03 3.330e+02 4.860e+02
 4.270e+02 7.310e+02 9.780e+02 8.390e+02 7.090e+02 1.509e+03 9.310e+02
 7.800e+02 1.123e+03 5.420e+02 3.400e+03 4.510e+02 1.473e+03 1.189e+03
 7.400e+02 5.000e+02 5.860e+02]

language:
	- Total de datos únicos: 47
	- Valores: ['English' nan 'Japanese' 'French' 'Mandarin' 'Aboriginal' 'Spanish'
 'Filipino' 'Hindi' 'Russian' 'Maya' 'Kazakh' 'Telugu' 'Cantonese'
 'Icelandic' 'German' 'Aramaic' 'Italian' 'Dutch' 'Dari' 'Hebrew'
 'Chinese' 'Mongolian' 'Swedish' 'Korean' 'Thai' 'Polish' 'Bosnian'
 'Hungarian' 'Portuguese' 'Danish' 'Arabic' 'Norwegian' 'Czech' 'Kannada'
 'Zulu' 'Panjabi' 'Tamil' 'Dzongkha' 'Vietnamese' 'Indonesian' 'Urdu'
 'Romanian' 'Persian' 'Slovenian' 'Greek' 'Swahili']

country:
	- Total de datos únicos: 66
	- Valores: ['USA' 'UK' nan 'New Zealand' 'Canada' 'Australia' 'Belgium' 'Japan'
 'Germany' 'China' 'France' 'New Line' 'Mexico' 'Spain' 'Hong Kong'
 'Czech Republic' 'India' 'Soviet Union' 'South Korea' 'Peru' 'Italy'
 'Russia' 'Aruba' 'Denmark' 'Libya' 'Ireland' 'South Africa' 'Iceland'
 'Switzerland' 'Romania' 'West Germany' 'Chile' 'Netherlands' 'Hungary'
 'Panama' 'Greece' 'Sweden' 'Norway' 'Taiwan' 'Official site' 'Cambodia'
 'Thailand' 'Slovakia' 'Bulgaria' 'Iran' 'Poland' 'Georgia' 'Turkey'
 'Nigeria' 'Brazil' 'Finland' 'Bahamas' 'Argentina' 'Colombia' 'Israel'
 'Egypt' 'Kyrgyzstan' 'Indonesia' 'Pakistan' 'Slovenia' 'Afghanistan'
 'Dominican Republic' 'Cameroon' 'United Arab Emirates' 'Kenya'
 'Philippines']

content_rating:
	- Total de datos únicos: 19
	- Valores: ['PG-13' nan 'PG' 'G' 'R' 'TV-14' 'TV-PG' 'TV-MA' 'TV-G' 'Not Rated'
 'Unrated' 'Approved' 'TV-Y' 'NC-17' 'X' 'TV-Y7' 'GP' 'Passed' 'M']

budget:
	- Total de datos únicos: 440
	- Valores: [2.3700000e+08 3.0000000e+08 2.4500000e+08 2.5000000e+08           nan
 2.6370000e+08 2.5800000e+08 2.6000000e+08 2.0900000e+08 2.0000000e+08
 2.2500000e+08 2.1500000e+08 2.2000000e+08 2.3000000e+08 1.8000000e+08
 2.0700000e+08 1.5000000e+08 2.1000000e+08 1.7000000e+08 1.9000000e+08
 1.9500000e+08 1.0500000e+08 1.8500000e+08 1.4000000e+08 1.7600000e+08
 1.7800000e+08 1.7500000e+08 1.4500000e+08 1.6500000e+08 1.6000000e+08
 3.8000000e+07 1.5500000e+08 1.0000000e+08 1.4900000e+08 1.4200000e+08
 1.4400000e+08 1.3900000e+08 1.3500000e+08 1.3000000e+08 1.3700000e+08
 1.2000000e+08 1.5000000e+06 1.3200000e+08 1.1000000e+08 1.2500000e+08
 1.2750000e+08 1.2700000e+08 1.0300000e+08 6.5000000e+07 8.5000000e+07
 1.2300000e+08 1.1500000e+08 1.1700000e+08 1.1300000e+08 7.8000000e+07
 1.1600000e+08 1.1200000e+08 9.3000000e+07 1.0700000e+08 1.0900000e+08
 1.3300000e+08 1.0800000e+08 1.2600000e+08 9.0000000e+07 1.0200000e+08
 9.2000000e+07 8.3000000e+07 8.0000000e+07 8.4000000e+07 9.9000000e+07
 1.0000000e+07 9.8000000e+07 9.4000000e+07 9.5000000e+07 7.5000000e+07
 8.8000000e+07 6.8000000e+07 8.6000000e+07 2.0000000e+07 8.7000000e+07
 7.0000000e+07 6.0000000e+07 3.5000000e+07 8.0000000e+06 8.2000000e+07
 8.1000000e+07 7.9000000e+07 4.4000000e+07 4.0000000e+07 5.2000000e+07
 5.8000000e+07 4.5000000e+07 7.6000000e+07 8.1200000e+07 7.3000000e+07
 5.0000000e+07 5.3000000e+07 5.5000000e+07 7.4000000e+07 6.9000000e+07
 7.2000000e+07 5.9660000e+07 7.1500000e+07 6.6000000e+07 6.9500000e+07
 3.6000000e+07 5.9000000e+07 6.3000000e+07 6.2000000e+07 6.1000000e+07
 5.0100000e+07 1.6900000e+07 4.3000000e+07 6.4000000e+07 4.2000000e+07
 4.8000000e+07 3.0000000e+07 6.8005000e+07 5.8800000e+07 3.0000000e+06
 5.7000000e+07 5.6000000e+07 5.4000000e+07 1.4000000e+06 7.1000000e+07
 4.7000000e+07 2.0000000e+06 4.6000000e+07 5.2500000e+07 5.1000000e+07
 5.0200000e+07 2.5000000e+07 3.9000000e+08 4.9900000e+07 2.2000000e+07
 1.8000000e+07 4.9000000e+07 1.4000000e+07 1.0000000e+06 2.5000000e+06
 2.6000000e+07 4.4500000e+07 2.6000000e+06 3.1115000e+07 3.2000000e+07
 3.1000000e+07 2.7000000e+07 4.1000000e+07 3.4000000e+07 5.0000000e+05
 7.7000000e+07 2.4000000e+07 3.3000000e+07 3.9200000e+07 2.3000000e+07
 1.8026148e+07 3.9000000e+07 5.5363200e+08 3.8600000e+07 1.5000000e+07
 3.7000000e+07 2.9500000e+07 3.5200000e+07 2.9000000e+07 1.8000000e+06
 1.0700000e+07 1.9000000e+07 3.2500000e+07 2.8000000e+07 3.1500000e+07
 6.5000000e+06 3.0250000e+07 3.4200000e+07 1.7000000e+07 2.7800000e+07
 2.1000000e+07 1.2000000e+07 2.7500000e+07 1.6000000e+07 1.3500000e+07
 2.5530000e+07 2.5100000e+07 2.8000000e+06 2.5500000e+07 2.1150000e+07
 1.3000000e+07 8.2000000e+06 2.3600000e+07 1.2500000e+07 1.9430000e+07
 1.1000000e+07 2.2700000e+07 2.2500000e+07 2.3500000e+07 2.1500000e+07
 9.0000000e+06 1.9400870e+07 1.9800000e+07 8.0694700e+05 1.9500000e+07
 8.7000000e+06 2.4000000e+09 2.1275199e+09 1.3000000e+04 2.7220000e+07
 1.9400000e+07 1.8500000e+07 2.7000000e+06 1.1350000e+07 3.5000000e+06
 1.7900000e+07 1.7500000e+07 3.0000000e+05 4.0000000e+06 1.6500000e+07
 1.6800000e+07 1.6400000e+07 1.5600000e+07 1.7700000e+07 1.5500000e+07
 1.5300000e+07 9.8000000e+06 7.0000000e+06 1.1500000e+07 6.0000000e+06
 1.4600000e+07 1.4800000e+07 1.4500000e+07 1.4400000e+07 1.4200000e+07
 1.5800000e+07 8.5000000e+06 1.3400000e+07 1.3200000e+07 8.4950000e+06
 1.2620000e+07 3.6600000e+06 1.2800000e+07 1.0500000e+07 9.6000000e+06
 5.0000000e+06 1.2215500e+10 9.2000000e+06 2.5000000e+09 7.5000000e+06
 1.1900000e+07 1.0800000e+07 7.0000000e+08 1.0600000e+07 1.0818775e+07
 1.3800000e+07 1.2305523e+07 1.2600000e+07 6.4000000e+06 6.2000000e+06
 9.5000000e+06 8.9000000e+06 9.4000000e+06 9.3000000e+06 6.0000000e+08
 7.4000000e+06 7.2176000e+06 8.3532000e+04 4.0000000e+08 1.1400000e+07
 8.8000000e+06 8.6000000e+06 7.6230000e+06 8.3000000e+06 8.5500000e+06
 7.2000000e+06 1.1000000e+09 7.9000000e+06 7.7000000e+06 4.5000000e+06
 7.3000000e+06 6.6000000e+06 2.3000000e+06 3.5000000e+05 4.8250000e+06
 6.9000000e+06 6.8000000e+06 4.8000000e+06 6.2440870e+06 7.8400000e+06
 5.9520000e+06 5.3000000e+06 6.7000000e+06 3.5001590e+06 5.6000000e+06
 5.5000000e+06 3.8500000e+06 5.2500000e+06 3.0300000e+07 5.1000000e+06
 4.9000000e+06 3.3000000e+06 8.0000000e+05 2.2000000e+06 3.2090000e+06
 8.4450000e+07 8.9000000e+05 4.7000000e+06 4.6387830e+06 4.6000000e+06
 4.2000000e+09 4.4000000e+06 4.2000000e+06 1.1400000e+05 3.6000000e+06
 3.2000000e+06 3.4000000e+06 1.3000000e+06 6.5000000e+05 3.9500000e+06
 3.8000000e+06 3.9770000e+06 3.7687850e+06 3.7000000e+06 3.7169460e+06
 1.9900000e+07 3.4400000e+06 4.3000000e+06 3.1800000e+06 4.4903750e+06
 1.9000000e+06 2.9000000e+06 2.8838480e+06 2.6865850e+06 2.6500000e+06
 2.6270000e+06 2.5408000e+06 3.4000000e+04 8.4000000e+06 2.4000000e+06
 2.3610000e+06 2.4500000e+06 2.2954290e+06 2.2800000e+06 2.1600000e+06
 1.2000000e+06 2.1000000e+06 1.6140000e+06 1.4000000e+04 1.0000000e+05
 1.2500000e+06 1.9500000e+06 1.7500000e+06 1.7000000e+06 1.6447360e+06
 1.6500000e+06 1.6000000e+06 1.1000000e+06 1.6963770e+06 1.4550000e+06
 3.1500000e+06 1.3778000e+06 9.6000000e+05 1.5920000e+06 1.2880000e+06
 4.2700000e+05 1.4200000e+06 6.9539300e+05 9.5000000e+05 1.0000000e+09
 9.0000000e+05 9.8900000e+05 9.1300000e+05 9.1000000e+05 9.3000000e+05
 5.9000000e+05 8.5000000e+05 8.2500000e+05 9.9000000e+05 6.0000000e+05
 7.8000000e+05 7.7700000e+05 7.5000000e+05 7.0000000e+05 4.0000000e+05
 6.2500000e+05 6.0900000e+05 6.0000000e+04 5.6000000e+05 5.5000000e+05
 4.6000000e+04 1.5000000e+05 4.7500000e+05 4.5000000e+05 4.3900000e+05
 2.2500000e+05 1.0661670e+06 1.5000000e+04 2.2957500e+05 2.1800000e+02
 3.8590700e+05 3.7500000e+05 3.7900000e+05 1.7502110e+06 3.2500000e+05
 3.1200000e+05 2.0000000e+05 1.6000000e+05 2.5000000e+05 2.7000000e+05
 2.9000000e+05 3.6500000e+05 2.4500000e+05 2.4000000e+05 2.1000000e+05
 1.8000000e+05 1.2000000e+05 1.7500000e+05 1.6800000e+05 1.2500000e+05
 1.0300000e+05 2.0000000e+04 4.0000000e+04 7.0000000e+04 7.5000000e+04
 6.5000000e+04 6.2000000e+04 2.5000000e+04 5.0000000e+04 4.2000000e+04
 4.5000000e+04 3.0000000e+04 2.3000000e+05 2.7000000e+04 2.4000000e+04
 2.3000000e+04 2.2000000e+04 1.7350000e+04 1.0000000e+04 4.5000000e+03
 7.0000000e+03 3.2500000e+03 9.0000000e+03 1.4000000e+03 1.1000000e+03]

title_year:
	- Total de datos únicos: 92
	- Valores: [2009. 2007. 2015. 2012.   nan 2010. 2016. 2006. 2008. 2013. 2011. 2014.
 2005. 1997. 2004. 1999. 1995. 2003. 2001. 2002. 1998. 2000. 1990. 1991.
 1994. 1996. 1982. 1993. 1979. 1992. 1989. 1984. 1988. 1978. 1962. 1980.
 1972. 1981. 1968. 1985. 1940. 1963. 1987. 1986. 1973. 1983. 1976. 1977.
 1970. 1971. 1969. 1960. 1965. 1964. 1927. 1974. 1937. 1975. 1967. 1951.
 1961. 1946. 1953. 1954. 1959. 1932. 1947. 1956. 1945. 1952. 1930. 1966.
 1939. 1950. 1948. 1958. 1957. 1943. 1944. 1938. 1949. 1936. 1941. 1955.
 1942. 1929. 1935. 1933. 1916. 1934. 1925. 1920.]

actor_2_facebook_likes:
	- Total de datos únicos: 918
	- Valores: [9.36e+02 5.00e+03 3.93e+02 2.30e+04 1.20e+01 6.32e+02 1.10e+04 5.53e+02
 2.10e+04 4.00e+03 1.00e+04 4.12e+02 2.00e+03 3.00e+03 2.16e+02 8.16e+02
 9.72e+02 8.82e+02 6.00e+03 9.19e+02 1.40e+04 1.90e+04 5.63e+02 2.50e+04
 8.08e+02 7.79e+02 5.81e+02 9.56e+02 1.50e+04 3.68e+02 1.00e+03 2.20e+04
 9.81e+02 5.57e+02 5.09e+02 5.67e+02 9.68e+02 8.29e+02 1.50e+02 1.19e+02
 7.29e+02 2.68e+02 4.68e+02 1.90e+02 1.30e+04 8.48e+02 9.73e+02 1.20e+04
 1.60e+04 3.36e+02 8.54e+02 6.00e+01 7.67e+02 1.70e+04 5.25e+02 2.25e+02
 6.38e+02 7.19e+02 9.31e+02 7.26e+02 8.12e+02 9.53e+02 2.84e+02 2.70e+04
 1.06e+02 4.18e+02 7.95e+02 7.16e+02 8.20e+01 9.61e+02 5.99e+02 9.00e+03
 8.51e+02 2.69e+02 5.23e+02 1.98e+02 2.00e+04 7.45e+02 7.59e+02 6.07e+02
 8.97e+02 4.90e+02 8.52e+02 7.56e+02 5.51e+02 5.62e+02 5.48e+02 7.70e+02
 7.66e+02 3.70e+02 3.15e+02 5.50e+02 8.93e+02 9.34e+02 7.80e+02 3.00e+02
 5.36e+02 1.72e+02 3.21e+02 4.00e+02 8.81e+02 4.11e+02 9.67e+02 8.98e+02
 4.42e+02 7.02e+02 8.71e+02 9.43e+02 5.58e+02 5.70e+02 6.87e+02 5.74e+02
 2.37e+02 5.05e+02 9.79e+02 3.08e+02 3.72e+02 8.26e+02 8.90e+02 7.22e+02
 7.94e+02 3.58e+02 7.01e+02 3.65e+02 7.99e+02 8.50e+02 1.07e+02 9.92e+02
 2.76e+02 8.36e+02 8.09e+02 2.30e+01 2.93e+02 8.33e+02 3.60e+02 8.00e+03
 3.92e+02 5.54e+02 1.13e+02 5.13e+02 9.29e+02 7.10e+01 6.35e+02 7.43e+02
 8.86e+02 5.78e+02 8.01e+02      nan 6.31e+02 9.33e+02 7.98e+02 6.58e+02
 6.04e+02 7.82e+02 4.64e+02 7.13e+02 7.73e+02 7.00e+02 4.52e+02 6.25e+02
 7.62e+02 1.51e+02 7.10e+02 4.75e+02 6.53e+02 8.25e+02 7.35e+02 2.94e+02
 9.11e+02 6.95e+02 5.37e+02 4.20e+01 5.08e+02 1.96e+02 8.20e+02 3.31e+02
 7.40e+02 5.20e+02 5.60e+02 9.39e+02 8.57e+02 5.95e+02 4.19e+02 7.87e+02
 7.23e+02 9.03e+02 1.65e+02 6.27e+02 5.26e+02 2.56e+02 8.06e+02 5.00e+02
 5.59e+02 8.02e+02 6.00e+02 5.69e+02 6.24e+02 2.49e+02 1.17e+02 5.77e+02
 6.43e+02 6.02e+02 6.91e+02 6.55e+02 9.47e+02 9.46e+02 4.10e+02 8.05e+02
 1.42e+02 1.83e+02 9.06e+02 6.60e+02 6.50e+02 3.50e+01 2.27e+02 5.79e+02
 1.74e+02 3.88e+02 3.45e+02 5.92e+02 7.60e+02 1.69e+02 9.75e+02 7.00e+03
 6.70e+02 8.11e+02 5.33e+02 4.37e+02 2.23e+02 3.11e+02 8.69e+02 4.00e+00
 8.78e+02 9.82e+02 9.60e+02 1.77e+02 1.37e+02 6.42e+02 5.03e+02 2.43e+02
 6.39e+02 4.96e+02 2.77e+02 4.48e+02 2.10e+01 2.74e+02 3.96e+02 7.39e+02
 4.80e+01 1.80e+04 6.74e+02 9.71e+02 7.88e+02 0.00e+00 8.00e+01 6.30e+01
 3.07e+02 9.40e+01 1.61e+02 4.22e+02 9.04e+02 8.72e+02 5.06e+02 4.36e+02
 4.30e+02 7.64e+02 5.90e+02 4.27e+02 3.39e+02 9.88e+02 4.66e+02 5.76e+02
 5.49e+02 1.30e+02 4.86e+02 9.64e+02 9.70e+02 4.50e+02 3.03e+02 1.03e+02
 5.52e+02 8.89e+02 8.45e+02 2.70e+02 5.61e+02 6.92e+02 3.83e+02 4.40e+02
 1.70e+01 5.29e+02 3.09e+02 8.43e+02 8.61e+02 8.34e+02 6.97e+02 5.88e+02
 4.05e+02 5.85e+02 2.29e+02 3.27e+02 2.17e+02 8.41e+02 3.24e+02 3.26e+02
 2.20e+02 4.51e+02 8.55e+02 1.49e+02 3.63e+02 3.94e+02 9.55e+02 1.45e+02
 7.93e+02 6.98e+02 4.95e+02 7.30e+02 5.17e+02 7.18e+02 3.46e+02 5.80e+02
 3.17e+02 3.80e+02 1.92e+02 7.83e+02 9.09e+02 5.93e+02 8.50e+01 2.70e+01
 5.22e+02 6.10e+02 9.25e+02 9.13e+02 1.80e+01 8.62e+02 9.66e+02 6.64e+02
 8.99e+02 3.30e+02 1.47e+02 3.87e+02 5.01e+02 6.17e+02 2.63e+02 4.97e+02
 9.76e+02 2.08e+02 2.40e+02 2.57e+02 6.28e+02 3.34e+02 8.84e+02 9.44e+02
 6.80e+02 7.20e+02 4.67e+02 6.68e+02 3.01e+02 7.48e+02 4.17e+02 2.19e+02
 3.99e+02 3.44e+02 2.33e+02 4.41e+02 1.02e+02 2.65e+02 2.90e+02 3.29e+02
 7.08e+02 2.98e+02 9.62e+02 8.91e+02 9.35e+02 8.47e+02 6.63e+02 8.64e+02
 5.35e+02 8.27e+02 5.07e+02 8.60e+02 1.10e+02 5.91e+02 2.73e+02 4.00e+01
 4.16e+02 5.18e+02 3.90e+02 7.96e+02 9.63e+02 4.60e+02 6.40e+02 8.79e+02
 6.05e+02 8.28e+02 5.20e+01 3.67e+02 2.45e+02 6.20e+01 6.37e+02 9.57e+02
 5.34e+02 5.68e+02 7.21e+02 1.35e+02 5.00e+01 8.83e+02 9.20e+02 8.18e+02
 9.89e+02 2.58e+02 1.31e+02 1.71e+02 6.51e+02 9.12e+02 3.48e+02 9.95e+02
 8.22e+02 2.89e+02 8.23e+02 4.14e+02 7.00e+00 4.72e+02 4.55e+02 1.84e+02
 5.12e+02 9.23e+02 6.94e+02 6.19e+02 9.40e+02 3.42e+02 5.31e+02 8.44e+02
 5.55e+02 2.54e+02 6.29e+02 5.75e+02 9.77e+02 9.54e+02 2.90e+01 7.69e+02
 1.15e+02 5.41e+02 4.23e+02 6.23e+02 3.37e+02 5.30e+02 9.49e+02 7.24e+02
 9.10e+01 8.07e+02 1.08e+02 9.02e+02 2.79e+02 5.84e+02 6.41e+02 7.06e+02
 8.94e+02 4.63e+02 3.04e+02 3.62e+02 4.61e+02 4.84e+02 4.40e+01 6.80e+01
 2.02e+02 1.16e+02 1.41e+02 6.11e+02 7.38e+02 8.49e+02 2.80e+01 8.15e+02
 3.49e+02 9.22e+02 4.30e+01 7.86e+02 4.89e+02 1.52e+02 3.12e+02 8.10e+01
 8.96e+02 4.49e+02 1.57e+02 9.84e+02 8.74e+02 8.76e+02 1.37e+05 8.35e+02
 9.80e+02 7.80e+01 2.44e+02 4.26e+02 3.51e+02 6.89e+02 4.13e+02 4.70e+01
 7.41e+02 6.90e+02 6.10e+01 6.82e+02 6.83e+02 2.99e+02 6.13e+02 9.41e+02
 3.22e+02 2.53e+02 9.26e+02 7.54e+02 9.08e+02 8.87e+02 3.82e+02 5.96e+02
 5.45e+02 2.48e+02 4.60e+01 3.00e+00 7.55e+02 1.33e+02 5.42e+02 5.16e+02
 2.85e+02 5.43e+02 2.10e+02 3.16e+02 1.63e+02 9.24e+02 4.76e+02 5.97e+02
 6.69e+02 7.44e+02 7.07e+02 4.85e+02 5.21e+02 8.39e+02 7.90e+01 6.45e+02
 7.34e+02 6.18e+02 3.98e+02 7.42e+02 8.38e+02 3.38e+02 2.24e+02 9.15e+02
 6.48e+02 4.88e+02 4.45e+02 1.75e+02 7.30e+01 5.73e+02 3.28e+02 2.14e+02
 6.03e+02 2.06e+02 2.20e+01 4.57e+02 2.00e+00 9.00e+00 8.56e+02 7.74e+02
 6.49e+02 9.80e+01 1.50e+01 5.94e+02 4.03e+02 9.45e+02 4.91e+02 2.42e+02
 9.17e+02 1.60e+02 3.59e+02 2.39e+02 8.77e+02 4.33e+02 6.54e+02 2.04e+02
 6.52e+02 4.59e+02 4.39e+02 2.55e+02 2.21e+02 1.89e+02 7.20e+01 8.21e+02
 5.71e+02 3.30e+01 7.12e+02 2.71e+02 6.81e+02 2.91e+02 1.99e+02 7.76e+02
 9.42e+02 4.01e+02 6.86e+02 9.30e+01 3.43e+02 5.86e+02 2.36e+02 1.43e+02
 7.85e+02 6.78e+02 3.78e+02 1.81e+02 1.10e+01 4.09e+02 9.48e+02 4.29e+02
 5.90e+01 4.02e+02 4.47e+02 7.63e+02 2.86e+02 1.94e+02 3.47e+02 4.28e+02
 6.26e+02 9.37e+02 2.61e+02 9.27e+02 7.27e+02 9.85e+02 9.90e+01 4.82e+02
 5.00e+00 8.88e+02 1.54e+02 1.46e+02 1.32e+02 5.28e+02 2.64e+02 6.93e+02
 5.47e+02 2.32e+02 5.87e+02 9.50e+01 2.52e+02 6.01e+02 6.50e+01 2.13e+02
 6.77e+02 3.71e+02 1.39e+02 7.36e+02 1.11e+02 3.77e+02 8.90e+01 7.32e+02
 6.73e+02 6.34e+02 4.44e+02 8.37e+02 3.74e+02 1.55e+02 2.46e+02 4.34e+02
 9.18e+02 4.99e+02 2.00e+01 5.40e+01 3.00e+01 3.10e+01 2.66e+02 2.34e+02
 7.49e+02 1.73e+02 8.70e+02 1.28e+02 4.43e+02 2.95e+02 4.81e+02 8.59e+02
 8.30e+01 5.04e+02 9.91e+02 2.28e+02 7.97e+02 2.81e+02 8.60e+01 1.23e+02
 5.39e+02 2.50e+01 7.75e+02 2.41e+02 2.01e+02 1.34e+02 1.04e+02 2.26e+02
 8.75e+02 1.00e+01 9.01e+02 9.00e+02 3.66e+02 8.30e+02 9.07e+02 4.65e+02
 1.68e+02 1.91e+02 3.33e+02 7.60e+01 3.85e+02 1.00e+02 2.97e+02 9.38e+02
 5.50e+01 3.70e+01 8.04e+02 3.32e+02 9.20e+01 4.94e+02 8.70e+01 4.10e+01
 4.74e+02 6.12e+02 3.84e+02 1.78e+02 9.69e+02 5.32e+02 5.60e+01 7.50e+02
 8.67e+02 3.60e+01 2.82e+02 4.71e+02 2.30e+02 3.90e+01 4.24e+02 6.56e+02
 4.56e+02 1.44e+02 5.66e+02 8.00e+02 2.62e+02 3.76e+02 1.93e+02 3.41e+02
 6.70e+01 2.78e+02 3.06e+02 2.03e+02 2.72e+02 6.36e+02 3.57e+02 7.51e+02
 1.22e+02 9.60e+01 2.87e+02 1.18e+02 9.21e+02 1.70e+02 5.70e+01 2.92e+02
 3.40e+02 3.80e+01 4.31e+02 9.83e+02 5.11e+02 3.10e+02 6.22e+02 6.46e+02
 6.14e+02 1.64e+02 6.60e+01 3.19e+02 4.90e+01 3.97e+02 1.30e+01 4.50e+01
 3.53e+02 1.79e+02 1.62e+02 3.55e+02 5.56e+02 2.15e+02 4.69e+02 7.40e+01
 3.20e+01 1.14e+02 3.91e+02 6.90e+01 1.25e+02 3.50e+02 9.97e+02 8.42e+02
 1.53e+02 3.05e+02 2.38e+02 1.85e+02 3.20e+02 1.88e+02 5.27e+02 7.25e+02
 4.83e+02 6.85e+02 6.00e+00 5.44e+02 7.11e+02 1.56e+02 2.90e+04 2.75e+02
 4.46e+02 2.11e+02 7.15e+02 1.86e+02 2.18e+02 6.16e+02 2.96e+02 2.00e+02
 1.09e+02 1.36e+02 6.08e+02 4.62e+02 4.32e+02 1.76e+02 2.60e+01 3.18e+02
 1.27e+02 8.92e+02 1.97e+02 2.12e+02 8.00e+00 2.09e+02 3.14e+02 9.05e+02
 5.24e+02 6.59e+02 4.15e+02 4.79e+02 8.65e+02 1.05e+02 2.59e+02 4.21e+02
 1.67e+02 1.12e+02 3.73e+02 9.86e+02 5.10e+01 1.01e+02 7.70e+01 2.88e+02
 3.40e+01 5.14e+02 3.23e+02 4.38e+02 5.30e+01 4.07e+02 1.20e+02 7.78e+02
 5.72e+02 2.80e+02 4.35e+02 6.99e+02 6.15e+02 7.33e+02 4.80e+02 1.26e+02
 3.56e+02 2.22e+02 2.31e+02 1.87e+02 6.65e+02 6.33e+02 1.29e+02 6.40e+01
 5.82e+02 8.80e+01 1.60e+01 2.51e+02 8.10e+02 8.13e+02 2.60e+02 3.25e+02
 1.24e+02 3.02e+02 6.06e+02 7.47e+02 1.40e+01 1.58e+02 2.47e+02 6.75e+02
 8.40e+01 3.79e+02 1.59e+02 4.25e+02 3.54e+02 1.48e+02 5.89e+02 7.52e+02
 4.53e+02 4.58e+02 7.00e+01 4.54e+02 7.81e+02 2.40e+01 1.21e+02 4.04e+02
 4.20e+02 1.90e+01 3.61e+02 3.95e+02 2.07e+02 1.40e+02 5.02e+02 9.00e+01
 5.19e+02 4.92e+02 2.35e+02 7.09e+02 4.87e+02 5.80e+01 8.73e+02 3.75e+02
 7.50e+01 1.80e+02 1.66e+02 6.57e+02 2.05e+02 4.70e+02]

imdb_score:
	- Total de datos únicos: 78
	- Valores: [7.9 7.1 6.8 8.5 6.6 6.2 7.8 7.5 6.9 6.1 6.7 7.3 6.5 7.2 8.1 7.  7.7 8.2
 5.9 6.  5.7 6.4 6.3 5.6 8.3 8.  8.4 5.8 5.4 9.  4.8 5.2 7.6 4.5 5.5 8.6
 8.8 5.1 7.4 4.2 5.  4.9 3.7 5.3 4.3 3.8 4.4 3.3 2.2 8.9 8.7 4.6 2.4 3.4
 4.1 4.7 3.  3.6 3.5 2.7 1.7 4.  2.  9.3 2.9 3.9 2.8 2.3 1.9 3.1 9.5 9.1
 1.6 2.5 2.1 3.2 9.2 2.6]

aspect_ratio:
	- Total de datos únicos: 23
	- Valores: [ 1.78  2.35   nan  1.85  2.    2.2   2.39  2.24  1.33  4.    1.66  1.5
 16.    1.77  2.4   1.37  2.76  1.18  1.44  2.55  1.2   1.75  1.89]

movie_facebook_likes:
	- Total de datos únicos: 876
	- Valores: [ 33000      0  85000 164000  24000  29000 118000  10000 197000   5000
  48000 123000  58000  40000  65000  56000  17000  83000  26000  72000
  44000 150000  80000  95000  60000  41000  30000  94000 129000  82000
  92000  22000 115000  23000  46000  20000  39000  16000  13000  54000
  37000  27000  42000   2000  77000  18000  53000  89000  45000    677
  35000  55000  67000  96000 349000 175000 166000  14000  38000  11000
   8000  15000  63000 191000  19000  47000  62000   3000  25000  51000
 190000   6000  61000  71000     40     25  52000  31000 122000  97000
    459  68000  28000    291 147000  12000   4000    304  36000    894
  21000    946 153000     53 199000 108000 138000 124000    881    416
    578  66000    701   1000   9000  70000    988    979    788  59000
    372    863  49000    941    374   7000  57000 140000     91    607
    951  32000    257    665    964    995    785    138    413    893
    509 105000  43000    648    683    880    266    886    694  34000
    792    531    997    584    391    815    764    617  98000 144000
    688    892    177    295 114000    912 146000    885    781    858
    747    829    797  64000    621    448    690    956    846    470
    589    791    641    426 117000    296 112000    990    500    472
    782    960    953    316    610    437    361    853    672    605
  74000  90000    795    955    624    970    773    669    812    718
    877    705    915    779    697    943    422    288    255    352
    828    522    261    616    975    115  86000    743    593    451
    916    652     68    353    748  50000    455    161    505 101000
    681    663    263    390    445    301    702    394    279    299
    680    309    555    604    754    835    630    849    579    612
    866    328    401    158    565    211    823    265     89    478
    989    689    168    911    389    474    366    240    852    591
    133    638    919    913    949    538    452    433    471    704
    494    561    675    262    633     26    120    314    742    463
    783    350    643    625    559    290 149000    181    883  81000
  75000    602  78000    771    592    484    491    999    629    204
    462    654    387    287    447    201    209  76000    418  99000
    517  73000    464    845    315    817    188    359    736    874
    167    260    319    145    826    504    567    488    425    660
    542    666    187    329    671    686    657    901    834    564
    498    377    246    378    503    215    581    982    348    739
    206     83    937    507    942    636     90    200    891    876
    619    650    153    839    408 148000    376    532    767    765
    887    804    716    104    272    687    758    897    141    365
    284    800 131000    806    302    227    347    930    613    590
    431    124    501    264      4    269    713    983    977    444
    311    855    676    233    924    548    346    847    608    971
    235    241    831    973    399    136    217     30     58     70
    332    560    238    228    599    588    813    562    827    342
    271    320    476    814    110    339    419    300    154    903
    393    487    492    371    818    392    573    762    356    921
    939    824    282    191    558 165000    896    810    466    243
    673    327    175    116    489    566    622    251    436    247
    576    923    816    515    647    157    733    843    708    664
    193    473     18    821    277    998    373    184    530    889
    411    645    932    905    860    231    127     76    512    934
     64    502 106000    833 109000    862    679     22    618    186
     69    107    139    337    344    770  93000    902    549    151
    439    242    656    305    135    423     88    458    784    634
    355    443    774    541    745    449    405    550    174    967
     85    659    646    119    250    981    744    368    720    331
    851    441    808    518     49    398    944    345    208    106
    533     74    725    952     33    117    661    321    370    546
    402    160    695    486    746    850    957    898    580    838
    313    375    128    278    196    341    467    495    938    125
     77    859    429     79    412    226    434     75    143    294
    520    438     81    205    508    140     31    655    223    108
    129    978    407     44    446    357    962    122    724     97
     55    453     41    586    479    802     19    830    403    256
    985    926     28     84    289     29     52    224    570    176
    275    543    109    244    113    421    620    963    232    575
    270    738    213    740    207    729    249    920    323    225
    274    777    639     38    165    190     35    750    864     11
      2    199    285    539    528     62     61    283    842    118
    414    931    131    987    872    111    614    182    132     98
    325    798    170    237    456    385    358    280    360    121
    721    482    606    799    100     63    710    609    933    635
    545    837    595    367    417    430     47    450    442     48
     54     32    406    395     37    229    480    968    369    349
    381     50    254    819     39     16    144  79000    172    870
    162     14    380    763    236     60    134     43    166    974
    857     10    715    195    825    805    651    728    298    544
    511     82    169      8    569    326     73    794    869    844
      9    682    415    307    212    216    594     57    400    969
    220     66    632    221    756     27     87    477    954     51
    234    603     92    571     24    150    587    706    210    793
     17     42    259    297      3    936    884    698    409    312
    219    130    396    163    420    523    303    102    203      5
     12     34     36      7    126    180     86    155    183     23
    707    123    460    865    194    526    691    583    379    363
    114     46    178     93     67    519     71     72    801    465
     96    198    267    142     65    197    667    239     13    535
     45    324    171    105    424     20]

Análisis general del dataset¶

Descripción de información estadística básica y general del dataset

In [15]:
dataset.describe()
Out[15]:
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_1_facebook_likes gross num_voted_users cast_total_facebook_likes facenumber_in_poster num_user_for_reviews budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
count 4993.000000 5028.000000 4939.000000 5020.000000 5036.000000 4.159000e+03 5.043000e+03 5043.000000 5030.000000 5022.000000 4.551000e+03 4935.000000 5030.000000 5043.000000 4714.000000 5043.000000
mean 140.194272 107.201074 686.509212 645.009761 6560.047061 4.846841e+07 8.366816e+04 9699.063851 1.371173 272.770808 3.975262e+07 2002.470517 1651.754473 6.442138 2.220403 7525.964505
std 121.601675 25.197441 2813.328607 1665.041728 15020.759120 6.845299e+07 1.384853e+05 18163.799124 2.013576 377.982886 2.061149e+08 12.474599 4042.438863 1.125116 1.385113 19320.445110
min 1.000000 7.000000 0.000000 0.000000 0.000000 1.620000e+02 5.000000e+00 0.000000 0.000000 1.000000 2.180000e+02 1916.000000 0.000000 1.600000 1.180000 0.000000
25% 50.000000 93.000000 7.000000 133.000000 614.000000 5.340988e+06 8.593500e+03 1411.000000 0.000000 65.000000 6.000000e+06 1999.000000 281.000000 5.800000 1.850000 0.000000
50% 110.000000 103.000000 49.000000 371.500000 988.000000 2.551750e+07 3.435900e+04 3090.000000 1.000000 156.000000 2.000000e+07 2005.000000 595.000000 6.600000 2.350000 166.000000
75% 195.000000 118.000000 194.500000 636.000000 11000.000000 6.230944e+07 9.630900e+04 13756.500000 2.000000 326.000000 4.500000e+07 2011.000000 918.000000 7.200000 2.350000 3000.000000
max 813.000000 511.000000 23000.000000 23000.000000 640000.000000 7.605058e+08 1.689764e+06 656730.000000 43.000000 5060.000000 1.221550e+10 2016.000000 137000.000000 9.500000 16.000000 349000.000000

Descripción de tipos y cantidades de datos non-nulls en el dataset

In [16]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5043 entries, 0 to 5042
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      5024 non-null   object 
 1   director_name              4939 non-null   object 
 2   num_critic_for_reviews     4993 non-null   float64
 3   duration                   5028 non-null   float64
 4   director_facebook_likes    4939 non-null   float64
 5   actor_3_facebook_likes     5020 non-null   float64
 6   actor_2_name               5030 non-null   object 
 7   actor_1_facebook_likes     5036 non-null   float64
 8   gross                      4159 non-null   float64
 9   genres                     5043 non-null   object 
 10  actor_1_name               5036 non-null   object 
 11  movie_title                5043 non-null   object 
 12  num_voted_users            5043 non-null   int64  
 13  cast_total_facebook_likes  5043 non-null   int64  
 14  actor_3_name               5020 non-null   object 
 15  facenumber_in_poster       5030 non-null   float64
 16  plot_keywords              4890 non-null   object 
 17  movie_imdb_link            5043 non-null   object 
 18  num_user_for_reviews       5022 non-null   float64
 19  language                   5029 non-null   object 
 20  country                    5038 non-null   object 
 21  content_rating             4740 non-null   object 
 22  budget                     4551 non-null   float64
 23  title_year                 4935 non-null   float64
 24  actor_2_facebook_likes     5030 non-null   float64
 25  imdb_score                 5043 non-null   float64
 26  aspect_ratio               4714 non-null   float64
 27  movie_facebook_likes       5043 non-null   int64  
dtypes: float64(13), int64(3), object(12)
memory usage: 1.1+ MB

Limpieza de columnas irrelevantes o "sucias"¶

Las siguientes variables serán eliminadas del dataset, dado que están generando ruido en el modelo, y no están aportando un valor real en el valor predictivo:

Variable Motivo
movie_imdb_link Es un identificador externo (URL)
movie_title Es un indicador nominal que no aporta valor predictivo
director_name Es un indicador nominal que no aporta valor predictivo
actor_1_name Es un indicador nominal que no aporta valor predictivo
actor_2_name Es un indicador nominal que no aporta valor predictivo
actor_3_name Es un indicador nominal que no aporta valor predictivo
plot_keywords Texto libre que sería útil usando NLP
In [17]:
cols_to_drop = [
    'movie_imdb_link', 'movie_title', 'director_name',
    'actor_1_name', 'actor_2_name', 'actor_3_name',
    'plot_keywords',
]

dataset = dataset.drop(columns=cols_to_drop)
In [18]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5043 entries, 0 to 5042
Data columns (total 21 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      5024 non-null   object 
 1   num_critic_for_reviews     4993 non-null   float64
 2   duration                   5028 non-null   float64
 3   director_facebook_likes    4939 non-null   float64
 4   actor_3_facebook_likes     5020 non-null   float64
 5   actor_1_facebook_likes     5036 non-null   float64
 6   gross                      4159 non-null   float64
 7   genres                     5043 non-null   object 
 8   num_voted_users            5043 non-null   int64  
 9   cast_total_facebook_likes  5043 non-null   int64  
 10  facenumber_in_poster       5030 non-null   float64
 11  num_user_for_reviews       5022 non-null   float64
 12  language                   5029 non-null   object 
 13  country                    5038 non-null   object 
 14  content_rating             4740 non-null   object 
 15  budget                     4551 non-null   float64
 16  title_year                 4935 non-null   float64
 17  actor_2_facebook_likes     5030 non-null   float64
 18  imdb_score                 5043 non-null   float64
 19  aspect_ratio               4714 non-null   float64
 20  movie_facebook_likes       5043 non-null   int64  
dtypes: float64(13), int64(3), object(5)
memory usage: 827.5+ KB
In [19]:
dataset.head()
Out[19]:
color num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_1_facebook_likes gross genres num_voted_users cast_total_facebook_likes ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color 723.0 178.0 0.0 855.0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi 886204 4834 ... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color 302.0 169.0 563.0 1000.0 40000.0 309404152.0 Action|Adventure|Fantasy 471220 48350 ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color 602.0 148.0 0.0 161.0 11000.0 200074175.0 Action|Adventure|Thriller 275868 11700 ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000
3 Color 813.0 164.0 22000.0 23000.0 27000.0 448130642.0 Action|Thriller 1144337 106759 ... 2701.0 English USA PG-13 250000000.0 2012.0 23000.0 8.5 2.35 164000
4 NaN NaN NaN 131.0 NaN 131.0 NaN Documentary 8 143 ... NaN NaN NaN NaN NaN NaN 12.0 7.1 NaN 0

5 rows × 21 columns

Inferencia de tipos de variables¶

Definir target de análisis. Según el objetivo del análisis, se debe determinar la variable objetivo, por ejemplo:

  • Para predicción de validad o éxito de un película: imdb_score
  • Para predicción de ingresos: gross
  • Para predicción de clasificación de audiencia: content_rating
  • Para predicción de popularidad en redes sociales: movie_facebook_likes
  • Para predicción de año de lanzamiento: title_year
  • Para predicción de clasificación de género: genres

En este ejemplo usaré la variable imdb_score como target

In [20]:
target = 'imdb_score'
features = [i for i in dataset.columns if i not in [target]]
number_unique_rows = dataset[features].nunique()

Inferencia de features numéricos y categóricos

In [21]:
numerical_features = []; 
categorical_features = []; 

for col in features:
    if dataset[col].dtype == 'object' or number_unique_rows[col] <= 45:
        categorical_features.append(col)
    else: 
        numerical_features.append(col)

print('\n\033[1mInferencia:\033[0m El dataset tiene {} features numéricas y {} features categóricas.'.format(len(numerical_features),len(categorical_features)))
Inferencia: El dataset tiene 13 features numéricas y 7 features categóricas.

Validación de datos nulos¶

Validación de cantidad de nulos en el dataset

In [22]:
dataset.isnull().sum()
Out[22]:
color                         19
num_critic_for_reviews        50
duration                      15
director_facebook_likes      104
actor_3_facebook_likes        23
actor_1_facebook_likes         7
gross                        884
genres                         0
num_voted_users                0
cast_total_facebook_likes      0
facenumber_in_poster          13
num_user_for_reviews          21
language                      14
country                        5
content_rating               303
budget                       492
title_year                   108
actor_2_facebook_likes        13
imdb_score                     0
aspect_ratio                 329
movie_facebook_likes           0
dtype: int64

Porcentaje de nulos por feature

In [23]:
for key in dataset.keys():
    null_sum = dataset[key].isnull().sum() 
    if null_sum > 0:
        percentage = null_sum/dataset.shape[0] * 100
        print(f"\033[1m{key}:\033[0m {format_decimals(percentage)}%")
color: 0.38%
num_critic_for_reviews: 0.99%
duration: 0.3%
director_facebook_likes: 2.06%
actor_3_facebook_likes: 0.46%
actor_1_facebook_likes: 0.14%
gross: 17.53%
facenumber_in_poster: 0.26%
num_user_for_reviews: 0.42%
language: 0.28%
country: 0.1%
content_rating: 6.01%
budget: 9.76%
title_year: 2.14%
actor_2_facebook_likes: 0.26%
aspect_ratio: 6.52%
In [24]:
fig, ax = plt.subplots(figsize=(15, 15))
sns.heatmap(dataset.isnull(), cbar=False, cmap="viridis")
Out[24]:
<Axes: >
No description has been provided for this image

Imputación de datos¶

  • Técnica de imputación con Moda: Valores más frecuentes para las columnas color, facenumber_in_poster, language, aspect_ratio
In [25]:
features = ["color", "facenumber_in_poster", "language", "aspect_ratio"]

def impute_value_by_mode(variables):
    for var in variables:
        var_mode = dataset[var].mode()[0]
        print(f"{var}: {var_mode}")
        dataset[var] = dataset[var].fillna(var_mode)
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_by_mode(features)
color: Color
Validación de nulos para color: 0
facenumber_in_poster: 0.0
Validación de nulos para facenumber_in_poster: 0
language: English
Validación de nulos para language: 0
aspect_ratio: 2.35
Validación de nulos para aspect_ratio: 0
  • Técnica de imputación con Categoría: "Desconocido" en caso de null para las columnas de content_rating (Originalmente se pensaba imputar también las columnas director_name, actor_1_name, actor_2_name, actor_3_name, y plot_keywords, pero, se descartaron al momento de limpiar las columnas sucias del dataset)
In [26]:
# features = ["director_name", "actor_1_name", "actor_2_name", "actor_3_name", "plot_keywords", "content_rating"]
features = ["content_rating"]

def impute_value_by_category(variables):
    for var in variables:
        category = "Unknown"
        dataset[var] = dataset[var].fillna(category)
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_by_category(features)
Validación de nulos para content_rating: 0
  • Técnica de imputación con Media: Media de valores para las columnas num_critic_for_reviews, num_user_for_reviews
In [27]:
features = ["num_critic_for_reviews", "num_user_for_reviews"]

def impute_value_by_mean(variables):
    for var in variables:
        var_mean = dataset[var].mean()
        print(f"{var}: {var_mean}")
        dataset[var] = dataset[var].fillna(var_mean)
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_by_mean(features)
num_critic_for_reviews: 140.1942719807731
Validación de nulos para num_critic_for_reviews: 0
num_user_for_reviews: 272.77080844285143
Validación de nulos para num_user_for_reviews: 0
  • Técnica de imputación con Mediana: Mediana de valores para las columnas duration, title_year
In [28]:
features = ["duration", "title_year"]

def impute_value_by_median(variables):
    for var in variables:
        var_median = dataset[var].median()
        print(f"{var}: {var_median}")
        dataset[var] = dataset[var].fillna(var_median)
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_by_median(features)
duration: 103.0
Validación de nulos para duration: 0
title_year: 2005.0
Validación de nulos para title_year: 0
  • Técnica de imputación con Ceros: Imputación con 0 para las columnas director_facebook_likes, actor_1_facebook_likes, actor_2_facebook_likes, actor_3_facebook_likes
In [29]:
features = ["director_facebook_likes", "actor_1_facebook_likes", "actor_2_facebook_likes", "actor_3_facebook_likes"]

def impute_value_with_zeros(variables):
    for var in variables:
        dataset[var] = dataset[var].fillna(0)
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_with_zeros(features)
Validación de nulos para director_facebook_likes: 0
Validación de nulos para actor_1_facebook_likes: 0
Validación de nulos para actor_2_facebook_likes: 0
Validación de nulos para actor_3_facebook_likes: 0
  • Técnica de imputación por Interpolación: Interpolación de las columnas gross, budget.
In [30]:
features = ["gross", "budget"]

def impute_value_by_interpolate(variables):
    for var in variables:
        dataset[var] = dataset[var].interpolate(method="linear")
        print(f"Validación de nulos para {var}: {dataset[var].isnull().sum()}")

impute_value_by_interpolate(features)
Validación de nulos para gross: 0
Validación de nulos para budget: 0
  • Técnica de imputación condicional: Condicional para country basado en el idioma.
In [31]:
country_by_language = {
    'Aboriginal': "Australia", 
    'Arabic': random.choice(["Egypt", "Libya", "United Arab Emirates"]), 
    'Aramaic': random.choice(["Siria", "Irak"]), 
    'Bosnian': "Bosnia",
    'Cantonese': random.choice(["Hong Kong", "China"]),
    'Chinese': "China", 
    'Czech': "Czech Republic", 
    'Danish': "Denmark", 
    'Dari': "Afghanistan", 
    'Dutch': random.choice(["Netherlands", "Belgium", "Aruba"]), 
    'Dzongkha': "Butan", 
    'English': random.choice(['USA', 'UK', 'New Zealand', 'Canada', 'Australia', 'Ireland', 'South Africa', 'Bahamas', 'Nigeria', 'Philippines']), 
    'Filipino': "Philippines", 
    'French': random.choice(["France", "Belgium", "Canada", "Switzerland", "Cameroon"]), 
    'German': random.choice(["Germany", "Austria", "Switzerland", "West Germany"]), 
    'Greek': "Greece", 
    'Hebrew': "Israel",
    'Hindi': "India", 
    'Hungarian': "Hungary", 
    'Icelandic': "Iceland", 
    'Indonesian': "Indonesia", 
    'Italian': random.choice(["Italy", "Switzerland"]), 
    'Japanese': "Japan", 
    'Kannada': "India",
    'Kazakh': "Kazakhstan", 
    'Korean': "South Korea", 
    'Mandarin': random.choice(["China", "Taiwan"]), 
    'Maya': "Mexico", 
    'Mongolian': "Mongolia", 
    'Norwegian': "Norway", 
    'Panjabi': random.choice(["Pakistan", "India"]), 
    'Persian': random.choice(["Iran", "Afghanistan"]), 
    'Polish': "Poland", 
    'Portuguese': random.choice(["Brazil", "Portugal"]), 
    'Romanian': "Romania", 
    'Russian': random.choice(["Russia", "Soviet Union", "Kyrgyzstan"]), 
    'Slovenian': "Slovenia", 
    'Spanish': random.choice(["Mexico", "Spain", "Argentina", "Colombia", "Chile", "Panama", "Peru", "Dominican Republic"]),
    'Swahili': "Kenya",
    'Swedish': random.choice(["Sweden", "Finland"]), 
    'Tamil': random.choice(["India", "Sri Lanka"]), 
    'Telugu': "India", 
    'Thai': "Thailand", 
    'Urdu': random.choice(["Pakistan", "India"]),
    'Vietnamese': "Vietnam", 
    'Zulu': "South Africa", 
}
In [32]:
def impute_country(row):
    if pd.isnull(row['country']):
        return country_by_language.get(row["language"], row['country'])
    return row['country']

dataset['country'] = dataset.apply(impute_country, axis=1) # type: ignore
print(f"Validación de nulos para country: {dataset['country'].isnull().sum()}")
Validación de nulos para country: 0

Validación de datos nulos post-imputación¶

Revisión de conteo de nulos

In [33]:
dataset.isnull().sum()
Out[33]:
color                        0
num_critic_for_reviews       0
duration                     0
director_facebook_likes      0
actor_3_facebook_likes       0
actor_1_facebook_likes       0
gross                        0
genres                       0
num_voted_users              0
cast_total_facebook_likes    0
facenumber_in_poster         0
num_user_for_reviews         0
language                     0
country                      0
content_rating               0
budget                       0
title_year                   0
actor_2_facebook_likes       0
imdb_score                   0
aspect_ratio                 0
movie_facebook_likes         0
dtype: int64

Revisión de mapa de calor para nulos

In [34]:
fig, ax = plt.subplots(figsize=(15, 15))
sns.heatmap(dataset.isnull(), cbar=False, cmap="viridis")
Out[34]:
<Axes: >
No description has been provided for this image

Análisis de Exploración de Data (EDA)¶

In [35]:
is_interactive = False

Boxplot Crudo¶

Mediante un diagrama de boxplot puedo observar cómo se distribuyen o concentran los valores en el dataset, los valores que se encuentran fuera de los bigotes indican películas con puntajes inusualmente altos o bajos, a los cuales conocemos como outliers. Otra función de este diagrama, es poder determinar simetría o sesgo al observar si la caja está centrada o inclinada.

Adicionalmente, puedo determinar si es necesario filtrar elementos por poca relevancia dentro del dataset.

In [36]:
if is_interactive:
    EDAVisualizerInteractive.plot_boxplot( "Comparación de IMDb Score por País", dataset, "country", "imdb_score", "País", "IMDb Score")
else:
    EDAVisualizerStatic.plot_boxplot( "Comparación de IMDb Score por País", dataset, "country", "imdb_score", "País", "IMDb Score")
No description has been provided for this image

Histogramas¶

Observar el comportamiento de la relación entre los países con suficientes registros en el dataset vs la variable target. Determinaré que el mínimo de registros debe ser mayor o igual al primer cuartil de frecuencia (Q1).

In [37]:
country_counts = dataset["country"].value_counts()
q1 = country_counts.quantile(0.25)
filtered_countries = country_counts.loc[lambda s: s > q1].index.tolist()

if is_interactive:
    EDAVisualizerInteractive.plot_histogram_by_category(
        title=f"Distribución de IMDb Score en {filtered_countries[0]}" if filtered_countries else "Distribución de IMDb Score",
        subtitle="Distribución de IMDb Score en",
        data=dataset,
        category_col="country",
        value_col=target,
        categories=filtered_countries,
        nbins=20,
        x_label="IMDb Score",
        y_label="Frecuencia",
        save_html=True,
        save_folder="assets",
    )
else:
    for country in filtered_countries:
        EDAVisualizerStatic.plot_histogram(
            title=f"Histograma de IMDb Score para {country}",
            data=dataset[dataset["country"] == country],
            column=target,
            x_label="IMDb Score",
            y_label="Frecuencia",
            bins=20,
            x_range=(0, 10)
        )
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Filtrado del dataset¶

In [38]:
dataset_filtered = dataset[dataset["country"].isin(filtered_countries)]

Boxplot filtrado¶

Comparar los países mediante un diagrama boxplot, sin tener en cuenta a los países con pocos registros

In [39]:
if is_interactive:
    EDAVisualizerInteractive.plot_boxplot(
        title = "Comparación filtrada de IMDb Score por País",
        data = dataset_filtered, 
        x="country", y="imdb_score", x_label="País", y_label="IMDb Score"
    )
else:
    EDAVisualizerStatic.plot_boxplot(
        title = "Comparación filtrada de IMDb Score por País", 
        data = dataset_filtered,
        x="country", y="imdb_score", x_label="País", y_label="IMDb Score"
    )
No description has been provided for this image

Visualización de features categóricos¶

In [40]:
if is_interactive:
    EDAVisualizerInteractive.plot_categorical_counts_dropdown(
        title="Frecuencia de categorías",
        data=dataset_filtered,
        categorical_cols=categorical_features,
        top_n=8,            # aplica Top 8 cuando hay muchas categorías
        template="plotly_white",
        save_html=True,     # guarda en assets/Frecuencia_de_categorías.html
        save_folder="assets",
        show=True
    )
else:
    EDAVisualizerStatic.plot_categorical_counts_grid(
        data=dataset_filtered,
        categorical_cols=categorical_features,
        n_cols=2,
        top_n=8,
        rotate_xticks=45,   # como en tu código original
        show=True
    )
No description has been provided for this image

Visualización de features numéricos¶

In [41]:
if is_interactive:
    EDAVisualizerInteractive.plot_numerical_hists_dropdown(
        title="Distribuciones numéricas",
        data=dataset_filtered,
        numerical_cols=numerical_features,
        nbins=30,
        template="plotly_white",
        save_html=True,      # guarda en assets/Distribuciones_numéricas.html
        save_folder="assets",
        show=True
    )
else:
    n = 2
    plt.figure(figsize=[15, 3 * math.ceil(len(numerical_features) / n)])

    for i, col in enumerate(numerical_features):
        plt.subplot(math.ceil(len(numerical_features) / n), n, i + 1)
        sns.histplot(data=dataset_filtered, x=col, bins=30, kde=True, color='steelblue')
        plt.title(col)
        plt.xlabel(col)
        plt.ylabel("Frecuencia")

    plt.tight_layout(pad=2.0)
    plt.subplots_adjust(hspace=0.6, wspace=0.4)
    plt.show()
No description has been provided for this image

Matriz de dispersión¶

Entender la relación entre todas las características

In [42]:
if is_interactive:
    EDAVisualizerInteractive.plot_pairplot(
        dataset = dataset_filtered, 
        title = 'Matriz de dispersión interactiva',
        height = 2500,
        width = 2500,
    )
else:
    EDAVisualizerStatic.plot_pairplot(
        dataset = dataset_filtered, 
        title = 'Pairplots for all the Feature'
    )
No description has been provided for this image

Eliminar outliers¶

Se eliminan las filas donde alguna variable tiene un z-score (puntaje estándar que indica cuantas desviaciones estándar se encuentra un valor con respecto a la media de su distribución) mayor a 3.

Si el valor de z-score está en 0, el valor está exactamente en la media. Si es +1, el valor está 1 desviación estándar por encima de la media. Si el valor es -2, el valor está 2 desviaciones estándar por debajo de la media. Si el valor es mayor a +3 o menor a -3, el valor se considera un outlier.

In [43]:
z_scores = np.abs(zscore(dataset_filtered.select_dtypes(include='number'))) # type: ignore
dataset_no_outliers = dataset_filtered[(z_scores < 3).all(axis=1)]

Se calcula el IQR solo sobre las columnas numéricas y se filtran filas sin outliers.

In [44]:
numeric_cols = dataset_filtered.select_dtypes(include='number')

Q1 = numeric_cols.quantile(0.25)
Q3 = numeric_cols.quantile(0.75)
IQR = Q3 - Q1

dataset_no_outliers = dataset_filtered[~((numeric_cols < (Q1 - 1.5 * IQR)) | (numeric_cols > (Q3 + 1.5 * IQR))).any(axis=1)]
In [45]:
print('\n\033[1mInferencia: \033[0mAntes de remover outliers, el dataset tenía {} ejemplos.'.format(dataset_filtered.shape[0]))
print('Después de remover los outliers, el dataset ahora tiene {} ejemmplos.'.format(dataset_no_outliers.shape[0]))
Inferencia: Antes de remover outliers, el dataset tenía 5015 ejemplos.
Después de remover los outliers, el dataset ahora tiene 2349 ejemmplos.
In [46]:
if is_interactive:
    EDAVisualizerInteractive.plot_boxplot(
        title = "Comparación sin outliers de IMDb Score por País",
        data = dataset_no_outliers, 
        x="country", y="imdb_score", x_label="País", y_label="IMDb Score"
    )
else:
    EDAVisualizerStatic.plot_boxplot(
        title = "Comparación sin outliers de IMDb Score por País", 
        data = dataset_no_outliers,
        x="country", y="imdb_score", x_label="País", y_label="IMDb Score"
    )
No description has been provided for this image

One hot encoding¶

In [47]:
dataset_no_outliers = dataset_no_outliers.copy()

dataset_no_outliers['genres'] = dataset_no_outliers['genres'].fillna('')

dataset_no_outliers['genres_list'] = dataset_no_outliers['genres'].apply(lambda x: x.split('|'))

mlb = MultiLabelBinarizer()
genres_encoded = pd.DataFrame(mlb.fit_transform(dataset_no_outliers['genres_list']), columns=mlb.classes_, index=dataset_no_outliers.index) # type: ignore

dataset_no_outliers = pd.concat([dataset_no_outliers, genres_encoded], axis=1)
In [48]:
dataset_no_outliers = dataset_no_outliers.drop('genres', axis=1)
dataset_no_outliers = dataset_no_outliers.drop('genres_list', axis=1)
categorical_features.remove("genres")
In [49]:
dataset_no_outliers.head()
Out[49]:
color num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_1_facebook_likes gross num_voted_users cast_total_facebook_likes facenumber_in_poster ... Music Musical Mystery News Romance Sci-Fi Sport Thriller War Western
177 Color 21.0 60.0 0.0 184.0 982.0 93655348.5 16769 1687 2.0 ... 0 0 1 0 0 0 0 1 0 0
215 Color 85.0 102.0 323.0 241.0 845.0 32694788.0 101411 1815 1.0 ... 0 0 0 0 0 0 0 0 0 0
242 Color 33.0 116.0 0.0 141.0 936.0 114038688.0 20567 1609 1.0 ... 0 0 0 0 0 0 0 0 0 0
306 Color 174.0 121.0 0.0 595.0 1000.0 66862068.0 89509 3903 0.0 ... 0 0 1 0 0 0 0 0 0 0
324 Color 97.0 110.0 342.0 393.0 623.0 10200000.0 18697 1722 0.0 ... 0 0 1 0 0 0 0 1 0 0

5 rows × 42 columns

In [50]:
dataset_no_outliers.info()
<class 'pandas.core.frame.DataFrame'>
Index: 2349 entries, 177 to 5042
Data columns (total 42 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      2349 non-null   object 
 1   num_critic_for_reviews     2349 non-null   float64
 2   duration                   2349 non-null   float64
 3   director_facebook_likes    2349 non-null   float64
 4   actor_3_facebook_likes     2349 non-null   float64
 5   actor_1_facebook_likes     2349 non-null   float64
 6   gross                      2349 non-null   float64
 7   num_voted_users            2349 non-null   int64  
 8   cast_total_facebook_likes  2349 non-null   int64  
 9   facenumber_in_poster       2349 non-null   float64
 10  num_user_for_reviews       2349 non-null   float64
 11  language                   2349 non-null   object 
 12  country                    2349 non-null   object 
 13  content_rating             2349 non-null   object 
 14  budget                     2349 non-null   float64
 15  title_year                 2349 non-null   float64
 16  actor_2_facebook_likes     2349 non-null   float64
 17  imdb_score                 2349 non-null   float64
 18  aspect_ratio               2349 non-null   float64
 19  movie_facebook_likes       2349 non-null   int64  
 20  Action                     2349 non-null   int64  
 21  Adventure                  2349 non-null   int64  
 22  Animation                  2349 non-null   int64  
 23  Biography                  2349 non-null   int64  
 24  Comedy                     2349 non-null   int64  
 25  Crime                      2349 non-null   int64  
 26  Documentary                2349 non-null   int64  
 27  Drama                      2349 non-null   int64  
 28  Family                     2349 non-null   int64  
 29  Fantasy                    2349 non-null   int64  
 30  History                    2349 non-null   int64  
 31  Horror                     2349 non-null   int64  
 32  Music                      2349 non-null   int64  
 33  Musical                    2349 non-null   int64  
 34  Mystery                    2349 non-null   int64  
 35  News                       2349 non-null   int64  
 36  Romance                    2349 non-null   int64  
 37  Sci-Fi                     2349 non-null   int64  
 38  Sport                      2349 non-null   int64  
 39  Thriller                   2349 non-null   int64  
 40  War                        2349 non-null   int64  
 41  Western                    2349 non-null   int64  
dtypes: float64(13), int64(25), object(4)
memory usage: 789.1+ KB

Variables dummies¶

In [51]:
# language_dummies = pd.get_dummies(dataset_no_outliers['language'], prefix='lang')
# dataset_no_outliers = dataset_no_outliers.drop('language', axis=1)
# dataset_no_outliers = pd.concat([dataset_no_outliers, language_dummies], axis=1)
In [52]:
dummies = pd.get_dummies(dataset_no_outliers[categorical_features], drop_first=True)
dataset_no_outliers = pd.concat([dataset_no_outliers.drop(columns=categorical_features), dummies], axis=1)
In [53]:
dataset_no_outliers.head()
Out[53]:
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_1_facebook_likes gross num_voted_users cast_total_facebook_likes num_user_for_reviews budget ... content_rating_PG content_rating_PG-13 content_rating_R content_rating_TV-14 content_rating_TV-G content_rating_TV-MA content_rating_TV-PG content_rating_Unknown content_rating_Unrated content_rating_X
177 21.0 60.0 0.0 184.0 982.0 93655348.5 16769 1687 74.0 1500000.0 ... False False False True False False False False False False
215 85.0 102.0 323.0 241.0 845.0 32694788.0 101411 1815 546.0 85000000.0 ... False False True False False False False False False False
242 33.0 116.0 0.0 141.0 936.0 114038688.0 20567 1609 36.0 78000000.0 ... False False False False False False False True False False
306 174.0 121.0 0.0 595.0 1000.0 66862068.0 89509 3903 524.0 83000000.0 ... False False True False False False False False False False
324 97.0 110.0 342.0 393.0 623.0 10200000.0 18697 1722 263.0 10000000.0 ... True False False False False False False False False False

5 rows × 118 columns

In [54]:
dataset_no_outliers.info()
<class 'pandas.core.frame.DataFrame'>
Index: 2349 entries, 177 to 5042
Columns: 118 entries, num_critic_for_reviews to content_rating_X
dtypes: bool(80), float64(13), int64(25)
memory usage: 899.2 KB

Inferencia de tipos de variables post one hot encoding¶

In [55]:
target = 'imdb_score'
features = [i for i in dataset_no_outliers.columns if i not in [target]]
number_unique_rows = dataset_no_outliers[features].nunique()
In [56]:
numerical_features = []; 
categorical_features = []; 

for col in features:
    if dataset_no_outliers[col].dtype == 'object' or number_unique_rows[col] <= 45:
        categorical_features.append(col)
    else: 
        numerical_features.append(col)

print('\n\033[1mInferencia:\033[0m El dataset tiene {} features numéricas y {} features categóricas.'.format(len(numerical_features),len(categorical_features)))
Inferencia: El dataset tiene 12 features numéricas y 105 features categóricas.

Matriz de correlación¶

In [57]:
correlation_matrix = dataset_no_outliers.select_dtypes(include=['int64', 'float64']).corr().round(2)

if is_interactive:
    EDAVisualizerInteractive.plot_heatmap(correlation_matrix, "Mapa de Calor Interactivo de Correlaciones", 1200, 1200, 7)
else:
    EDAVisualizerStatic.plot_heatmap(correlation_matrix, figsize=(50,25))
No description has been provided for this image

Transformación de matriz de correlación en formato de pares

In [58]:
correlation_pairs = correlation_matrix.unstack()
correlation_pairs
Out[58]:
num_critic_for_reviews  num_critic_for_reviews     1.00
                        duration                   0.16
                        director_facebook_likes    0.21
                        actor_3_facebook_likes     0.13
                        actor_1_facebook_likes     0.20
                                                   ... 
aspect_ratio            Thriller                   0.14
                        War                        0.08
                        Western                    0.07
                        facenumber_in_poster       0.02
                        aspect_ratio               1.00
Length: 1444, dtype: float64

Filtrado de variables con una correlación mayor al 0.6

In [59]:
threshold = 0.6
In [60]:
sorted_pairs = correlation_pairs.sort_values(ascending=False) # type: ignore

high_correlated_pairs = sorted_pairs[((sorted_pairs > threshold) | (sorted_pairs < -threshold)) & (sorted_pairs != 1)]
high_correlated_pairs
Out[60]:
actor_1_facebook_likes     cast_total_facebook_likes    0.98
cast_total_facebook_likes  actor_1_facebook_likes       0.98
actor_2_facebook_likes     actor_3_facebook_likes       0.85
actor_3_facebook_likes     actor_2_facebook_likes       0.85
num_voted_users            num_user_for_reviews         0.74
num_user_for_reviews       num_voted_users              0.74
num_critic_for_reviews     num_user_for_reviews         0.64
num_user_for_reviews       num_critic_for_reviews       0.64
dtype: float64
In [61]:
vars_corr = list(set([i[0] for i in high_correlated_pairs.index] + [i[1] for i in high_correlated_pairs.index]))

filtered_corr = correlation_matrix.loc[vars_corr, vars_corr]

if is_interactive:
    EDAVisualizerInteractive.plot_heatmap(filtered_corr, "Mapa de Calor de Pares Altamente Correlacionados", 800, 1200, 12)
else:
    EDAVisualizerStatic.plot_heatmap(filtered_corr, figsize=(20,10))
No description has been provided for this image

Con los resultados anteriores, se puede observar que existen variables altamente correlacionadas que se pueden interpretar de la siguiente manera:

  • cast_total_facebook_likes y actor_1_facebook_likes → 0.98: El actor principal suele tener una gran influencia en el total de likes del elenco. Podrías considerar eliminar una de estas variables en modelos lineales para evitar colinealidad.
  • actor_2_facebook_likes y actor_3_facebook_likes → 0.85: Los actores secundarios tienden a tener niveles de popularidad similares, posiblemente por compartir tipo de rol o nivel de exposición.
  • num_voted_users y num_user_for_reviews → 0.74: A mayor cantidad de votos, mayor cantidad de reseñas. Esto puede reflejar popularidad.
  • num_critic_for_reviews y num_user_for_reviews → 0.64: - Las películas con más reseñas de usuarios también tienden a recibir más atención de críticos. Refleja visibilidad mediática.

Reducción de dimensionalidad¶

Para la reducción de dimensionalidad, usaré PCA (Análisis de Componentes Principales). El PCA servirá para reducir simplificar el dataset con muchas variables numéricas. Además, elimina redundancia si varias columnas están correlacionadas al combinarlo en componentes más informativos.

In [62]:
x_num = dataset_no_outliers.drop(columns=['imdb_score']).select_dtypes(include=['int64', 'float64']).dropna()
x_scaled = StandardScaler().fit_transform(x_num)

Dimensionalidad en 2D¶

In [63]:
pca = PCA(n_components=2)
x_pca = pca.fit_transform(x_scaled)
In [64]:
plt.scatter(x_pca[:, 0], x_pca[:, 1], c=dataset_no_outliers[target], cmap='viridis')
plt.xlabel('Componente 1')
plt.ylabel('Componente 2')
plt.title('Proyección PCA coloreada por IMDb Score')
plt.colorbar(label='IMDb Score')
plt.show()
No description has been provided for this image
In [65]:
loadings = pd.DataFrame(pca.components_.T, 
                        columns=['PC1', 'PC2'], 
                        index=x_num.columns)

loadings.head(10)
Out[65]:
PC1 PC2
num_critic_for_reviews 0.279405 -0.164931
duration 0.141880 -0.235369
director_facebook_likes 0.128136 -0.083867
actor_3_facebook_likes 0.286269 0.114801
actor_1_facebook_likes 0.288075 -0.032980
gross 0.278304 0.163043
num_voted_users 0.364319 -0.044537
cast_total_facebook_likes 0.320105 -0.004969
num_user_for_reviews 0.341804 -0.124066
budget 0.313881 0.131411
In [66]:
loadings['PC1'].sort_values(ascending=False).head(10)
Out[66]:
num_voted_users              0.364319
num_user_for_reviews         0.341804
cast_total_facebook_likes    0.320105
budget                       0.313881
actor_2_facebook_likes       0.304629
actor_1_facebook_likes       0.288075
actor_3_facebook_likes       0.286269
num_critic_for_reviews       0.279405
gross                        0.278304
Action                       0.145027
Name: PC1, dtype: float64

El componente principal 1 (PC1) captura el mayor porcentaje de varianza de los datos, es decir, es la dirección de máxima variación en los datos. En este caso, las variables que más contribuyen a la dirección de máxima variación, es decir, que tienen un mayor peso, son referentes a popularidad, visibilidad mediática, impacto económico, presencia en redes sociales y la duración de la película.

Partiendo de tales categorías, se podría decir que PC1 representa una dimensión de "exposición y recepción pública", es decir, qué tan visible, votada, comentada y económicamente exitosa es una película. Las películas con valores altos en PC1 tienden a ser más votadas, más comentadas, más exitosas comercialmente y más visibles en redes sociales.

In [67]:
loadings['PC2'].sort_values(ascending=False).head(10)
Out[67]:
Family                    0.402911
Animation                 0.339887
Comedy                    0.321957
Adventure                 0.289094
Fantasy                   0.264886
gross                     0.163043
Musical                   0.150288
budget                    0.131411
actor_3_facebook_likes    0.114801
actor_2_facebook_likes    0.095359
Name: PC2, dtype: float64

En el componente principal 2 (PC2) se encuentra la segunda dirección de máxima variación en los datos. En este caso, el componente logra capturar la dimensión temática y estilística de las películas. A diferencia del PC1, el PC2 se enfoca más en el contenido narrativo y el tipo de audiencia. En ese sentido, podria decir que PC2 representa una dimensión de "narrativa y audiencia", es decir, qué tan narrativa y popular es una película. Las películas con valores altos en PC2 tienden a estar más enfocadas a públicos amplios o familiares, haciendo uso de animación, fantasía y aventura y comedia, además, tienen una posible asociación con producciones de alto presupuesto pero un enfoque narrativo más ligero.

Dimensionalidad en 3D¶

In [68]:
pca = PCA(n_components=3)
x_pca = pca.fit_transform(x_scaled)
In [69]:
dataset_pca = pd.DataFrame(x_pca, columns=['PC1', 'PC2', 'PC3'])
dataset_pca['IMDb Score'] = dataset_no_outliers['imdb_score']


if is_interactive:
    EDAVisualizerInteractive.plot_3d_projection(
        dataset=dataset_pca,
        title='Proyección PCA en 3D (Interactiva)',
        x_label="PC1",
        y_label="PC2",
        z_label="PC3",
        label="IMDb Score"
    )
else:
    EDAVisualizerStatic.plot_3d_projection(
        dataset=dataset_no_outliers, 
        column='imdb_score', 
        title='Proyección PCA en 3D coloreada por IMDb Score',
        x_label="Componente 1",
        y_label="Componente 2",
        z_label="Componente 3",
        cbar_label="IMDb Score",
        figsize=(10, 8)
    )
No description has been provided for this image
In [70]:
loadings = pd.DataFrame(pca.components_.T, 
                        columns=['PC1', 'PC2', 'PC3'], 
                        index=x_num.columns)

loadings.head(10)
Out[70]:
PC1 PC2 PC3
num_critic_for_reviews 0.279405 -0.164931 0.116179
duration 0.141880 -0.235369 -0.147009
director_facebook_likes 0.128136 -0.083867 0.007356
actor_3_facebook_likes 0.286269 0.114801 -0.201801
actor_1_facebook_likes 0.288075 -0.032980 -0.236891
gross 0.278304 0.163043 0.072336
num_voted_users 0.364319 -0.044537 0.083702
cast_total_facebook_likes 0.320105 -0.004969 -0.256920
num_user_for_reviews 0.341804 -0.124066 0.134412
budget 0.313881 0.131411 0.069345
In [71]:
loadings['PC1'].sort_values(ascending=False).head(10)
Out[71]:
num_voted_users              0.364319
num_user_for_reviews         0.341804
cast_total_facebook_likes    0.320105
budget                       0.313881
actor_2_facebook_likes       0.304629
actor_1_facebook_likes       0.288075
actor_3_facebook_likes       0.286269
num_critic_for_reviews       0.279405
gross                        0.278304
Action                       0.145027
Name: PC1, dtype: float64
In [72]:
loadings['PC2'].sort_values(ascending=False).head(10)
Out[72]:
Family                    0.402911
Animation                 0.339887
Comedy                    0.321957
Adventure                 0.289094
Fantasy                   0.264886
gross                     0.163043
Musical                   0.150288
budget                    0.131411
actor_3_facebook_likes    0.114801
actor_2_facebook_likes    0.095359
Name: PC2, dtype: float64
In [73]:
loadings['PC3'].sort_values(ascending=False).head(10)
Out[73]:
Horror                    0.321810
Thriller                  0.308382
Sci-Fi                    0.240482
Mystery                   0.231936
Action                    0.212305
Adventure                 0.153874
Fantasy                   0.152227
num_user_for_reviews      0.134412
num_critic_for_reviews    0.116179
Animation                 0.103772
Name: PC3, dtype: float64

Con relación al componente 3 (PC3), captura aspectos más sutiles o específicos, tal como combinaciones de géneros o temas narrativos. En este caso, se asocia con una dimensión de intensidad narrativa y oscuridad temática, además de que sirve como contraste con el PC2, ya que facilita un análisis entre producciones de alto impacto emocional y aquellas que buscan ser más ligeras y familiares.

La segmentación temática la podemos tomar apoyados con el PC2 y PC3, ya que clasifican películas por tono narrativo. Al comparar el PC1 con el puntaje IMDb, se revela si la popularidad se asocia con calidad.

Con este PCA se logró sintetizar la información de múltiples variables en tres componentes principales que capturan dimensiones latentes del dataset de películas: el primero refleja el impacto público y la recepción masiva (votos, reseñas, éxito comercial), el segundo agrupa temáticas familiares y fantásticas (géneros como animación, aventura y comedia), y el tercero representa narrativas intensas y oscuras (thriller, horror, crimen).

Regresión lineal múltiple¶

Regresión lineal con todas las variables y sin PCA¶

In [74]:
X = dataset_no_outliers.drop(columns=['imdb_score'])
y = dataset_no_outliers['imdb_score']
In [88]:
linearRegressionWithoutPCA = CustomLinearRegression(X, y, "Linear_regression_without_PCA", is_interactive)
linearRegressionWithoutPCA.run()
--> Iniciando la division del dataset
	Tamaño del dataset: 2349
	Tamaño del dataset de entrenamiento: 1879
	Tamaño del dataset de prueba: 470
----------------------------------------
--> Iniciando el entrenamiento del modelo
----------------------------------------
--> Iniciando la predicción del modelo
----------------------------------------
--> Iniciando la evaluación del modelo
	Error absoluto medio (MAE): 0.55
	Error cuadrático medio (MSE): 0.49
	Coeficiente de determinación (R²): 0.37
----------------------------------------
--> Iniciando la creación del dataframe de coeficientes
----------------------------------------
--> Prediciendo sobre entrenamiento y prueba
----------------------------------------
--> Graficando comparación del modelo
No description has been provided for this image
----------------------------------------
--> Graficando residuos
No description has been provided for this image
----------------------------------------
--> Graficando importance de variables
No description has been provided for this image
----------------------------------------
In [76]:
linearRegressionWithoutPCA.summary()
RESUMEN DEL MODELO
----------------------------------------
--> Métricas de desempeño:
Total Features MAE MSE R2
0 117 0.554463 0.489728 0.370996
--> Principales coeficientes:
Variable Coeficiente
110 content_rating_TV-14 1.993295
112 content_rating_TV-MA 1.904206
84 country_Iran 1.774873
111 content_rating_TV-G 1.253610
50 language_Hebrew 1.165313
19 Documentary 1.098341
54 language_Japanese 0.921692
60 language_Norwegian 0.910126
113 content_rating_TV-PG 0.908675
44 language_Dutch 0.767355

Regresión lineal con PCA¶

In [77]:
X = x_pca
y = dataset_no_outliers['imdb_score']
In [89]:
linearRegressionWithPCA = CustomLinearRegression(X, y, "Linear_regression_with_PCA", is_interactive)
linearRegressionWithPCA.run()
--> Iniciando la division del dataset
	Tamaño del dataset: 2349
	Tamaño del dataset de entrenamiento: 1879
	Tamaño del dataset de prueba: 470
----------------------------------------
--> Iniciando el entrenamiento del modelo
----------------------------------------
--> Iniciando la predicción del modelo
----------------------------------------
--> Iniciando la evaluación del modelo
	Error absoluto medio (MAE): 0.55
	Error cuadrático medio (MSE): 0.49
	Coeficiente de determinación (R²): 0.37
----------------------------------------
--> Iniciando la creación del dataframe de coeficientes
----------------------------------------
--> Prediciendo sobre entrenamiento y prueba
----------------------------------------
--> Graficando comparación del modelo
No description has been provided for this image
----------------------------------------
--> Graficando residuos
No description has been provided for this image
----------------------------------------
--> Graficando importance de variables
No description has been provided for this image
----------------------------------------
In [79]:
linearRegressionWithPCA.summary()
RESUMEN DEL MODELO
----------------------------------------
--> Métricas de desempeño:
Total Features MAE MSE R2
0 3 0.694786 0.741951 0.047041
--> Principales coeficientes:
Variable Coeficiente
0 PC1 -0.023093
2 PC3 -0.105278
1 PC2 -0.163191

Pruebas con otros modelos¶

Regresión Random Forest¶

In [80]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]

# Entrenar modelo
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X, y)

# Predecir sobre el mismo conjunto (evaluación interna)
y_pred_rf = rf.predict(X)

# Métricas de evaluación
mse = mean_squared_error(y, y_pred_rf)
r2 = r2_score(y, y_pred_rf)

print(f"Error cuadrático medio (MSE): {mse:.2f}")
print(f"Coeficiente de determinación (R²): {r2:.2f}")
Error cuadrático medio (MSE): 0.07
Coeficiente de determinación (R²): 0.92

Regresión Gradient Boosting¶

In [81]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]

# Entrenar modelo
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
gb.fit(X, y)

# Predecir sobre el mismo conjunto (evaluación interna)
y_pred_gb = gb.predict(X)

# Métricas de evaluación
mse_gb = mean_squared_error(y, y_pred_gb)
r2_gb = r2_score(y, y_pred_gb)

print(f"Error cuadrático medio (MSE): {mse_gb:.2f}")
print(f"Coeficiente de determinación (R²): {r2_gb:.2f}")
Error cuadrático medio (MSE): 0.31
Coeficiente de determinación (R²): 0.65

Support Vector Machine¶

In [82]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]

# Entrenar modelo SVR
svr = SVR(kernel='rbf', C=1.0, epsilon=0.2)
svr.fit(X, y)

# Predecir sobre el mismo conjunto (evaluación interna)
y_pred_svr = svr.predict(X)

# Métricas de evaluación
mse_svr = mean_squared_error(y, y_pred_svr)
r2_svr = r2_score(y, y_pred_svr)

print(f"Error cuadrático medio (MSE): {mse_svr:.2f}")
print(f"Coeficiente de determinación (R²): {r2_svr:.2f}")
Error cuadrático medio (MSE): 0.84
Coeficiente de determinación (R²): 0.04

K Nearest Neighbors¶

In [83]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]

# Entrenar modelo KNN
knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(X, y)

# Predecir sobre el mismo conjunto (evaluación interna)
y_pred_knn = knn.predict(X)

# Métricas de evaluación
mse_knn = mean_squared_error(y, y_pred_knn)
r2_knn = r2_score(y, y_pred_knn)

print(f"Error cuadrático medio (MSE): {mse_knn:.2f}")
print(f"Coeficiente de determinación (R²): {r2_knn:.2f}")
Error cuadrático medio (MSE): 0.66
Coeficiente de determinación (R²): 0.25

Ridge¶

In [84]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]

# Entrenar modelo Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)

# Predecir sobre el mismo conjunto (evaluación interna)
y_pred_ridge = ridge.predict(X)

# Métricas de evaluación
mse_ridge = mean_squared_error(y, y_pred_ridge)
r2_ridge = r2_score(y, y_pred_ridge)

print(f"Error cuadrático medio (MSE Ridge): {mse_ridge:.2f}")
print(f"Coeficiente de determinación (R² Ridge): {r2_ridge:.2f}")
Error cuadrático medio (MSE Ridge): 0.46
Coeficiente de determinación (R² Ridge): 0.47
c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.20624e-19): result may not be accurate.

Prueba con alta correlación¶

In [85]:
X = dataset_no_outliers[features]
y = dataset_no_outliers[target]
In [91]:
models = {
    'Linear': LinearRegression(),
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42),
    'SVR': SVR(kernel='rbf', C=1.0, epsilon=0.2),
    'K Neighbors': KNeighborsRegressor(n_neighbors=5),
    'Ridge': Ridge(alpha=1.0)
}

results = {
    'Modelo': [],
    'R²': [],
    'MSE': []
}

for name, model in models.items():
    r2_scores = cross_val_score(model, X, y, cv=5, scoring='r2').mean()
    mse_scores = -cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error').mean()
    print(f"{name} → R²: {r2_scores:.3f}, MSE: {mse_scores:.3f}")

    results['Modelo'].append(name)
    results['R²'].append(round(r2_scores.mean(), 3))
    results['MSE'].append(round(-mse_scores.mean(), 2))


fig = go.Figure()

# R²
fig.add_trace(go.Bar(
    x=results['Modelo'],
    y=results['R²'],
    name='R²',
    marker_color='mediumseagreen',
    text=results['R²'],
    textposition='auto'
))

# MSE
fig.add_trace(go.Bar(
    x=results['Modelo'],
    y=results['MSE'],
    name='MSE',
    marker_color='indianred',
    text=results['MSE'],
    textposition='auto'
))

# Configuración del gráfico
fig.update_layout(
    title='Comparación de Modelos de Regresión (Validación Cruzada)',
    barmode='group',
    xaxis_title='Modelo',
    yaxis_title='Valor',
    template='plotly_white',
    width=1000,
    height=600
)

fig.show()

fig_title = "assets/Validacion_cruzada_modelos.html"
fig.write_html(fig_title)
print("Gráfica guardada en", fig_title)
Linear → R²: 0.352, MSE: 0.555
Random Forest → R²: 0.398, MSE: 0.513
Gradient Boosting → R²: 0.438, MSE: 0.482
SVR → R²: -0.024, MSE: 0.868
K Neighbors → R²: -0.167, MSE: 0.990
Ridge → R²: 0.373, MSE: 0.536
c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=6.0454e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.5912e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.50027e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.17022e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.6825e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=6.0454e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.5912e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.50027e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.17022e-19): result may not be accurate.

c:\Users\cpaez\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy\_lib\_util.py:1233: LinAlgWarning:

Ill-conditioned matrix (rcond=3.6825e-19): result may not be accurate.

Gráfica guardada en assets/Validacion_cruzada_modelos.html

¿Cómo interpretar?

  • R² (coeficiente de determinación): mide qué proporción de la varianza de la variable objetivo es explicada por el modelo.
    • Valores cercanos a 1 indican buen ajuste.
    • Valores negativos (como en SVR y KNN) indican que el modelo rinde peor que una línea horizontal (promedio de los datos).
  • MSE (error cuadrático medio): mide el promedio del cuadrado de los errores.
    • Cuanto más bajo, mejor.
    • Es sensible a valores extremos.

Diagnóstico técnico

  • Gradient Boosting es el mejor modelo en este conjunto, con el mayor R² y menor MSE.
  • SVR y KNN tienen mal desempeño, probablemente por:
    • Falta de escalado de variables (ambos son sensibles a la escala).
    • Alta dimensionalidad o ruido en los datos.
  • Ridge mejora ligeramente sobre Linear al controlar la colinealidad.